Python good for data mining?

Dennis Lee Bieber · Nov 6, 2007

The 20th century perspective found it more flexible to base
everything on set theory (or category theory or similar)
which is fundamentally relational. Historically
hierarchical/network databases preceded rdbms's because they
are fundamentally more efficient. Unfortunately, they are
also fundamentally more inflexible (it is generally agreed).

<heh> My college database text book covered the subject with (in
order) a chapter on Hierarchical, then Network, and then Relational (as
a theoretical model).

The next edition of the text book started with Relational, and
treated Hierarchical and Network as historical artifacts.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

Jens · Nov 7, 2007

Jens,

You might be interested in this bookhttp://www.oreilly.com/catalog/9780596529321/index.html
which is new, I just ordered my copy. From the contents shown online,
it has lot of applicability to data mining, using Python, although it
its primary topic is data mining the web, it also covers analyzing the
data etc.

Ron Stephens

I'm a big fan of O'Reilly's books. This one looks very promising, and
I have just added it to my wishlist. Thanks for the tip Ron!

As for databases - I think its a good idea to support a good selection
of data sources. Just now, one of my main concerns, is to make a
structure that is flexible enough, as well as being easy to use.
Flexibility and usability!

Francesc · Nov 9, 2007

[I've just seen this thread. Although it might be a bit late, let me
state a couple of precisions]

Hi Maarten,

I respectfully disagree that HDF5 is not a DB. Its true that HDF5 on
its prima facie is not relational but rather hierarchical.

Yeah. This largely depends on what we understand by a DB. Lately,
RDBMs are used everywhere, and we tend to believe that they are the
only entities that can be truly called DBs. However, in a less
restrictive view, even a text file can be considered a DB (in fact,
many DBs have been implemented using text files as a base). So, I
wouldn't say that HDF5 is not a DB, but just that it is not a RDBM

Hierarchical is truely a much more natural/elegant[1] design from my
perspective. HDF has always had meta-data capabilities and with the
new 1.8beta version available, it is increasing its ability with
'references/links' allowing for pure/partial relational datasets,
groups, and files as well as storing self implemented indexing.

The C API is obviously much more low level, and Pytables does not yet
support these new features.

That's correct. And although it is well possible that we, PyTables
developers, would end incorporating some relational features to it, we
also recognize that PyTables does not intend (and was never in our
plans) to be a competitor of a pure RDBMS, but rather a *teammate* (as
it is clearly stated in the www.pytables.org home page).

In our opinion, PyTables opens the door to a series of capabilities
that are not found in typical RDBMS, like hierarchical classification,
multidimensional datasets, powerful table entities that are able to
deal with multidimensional columns or nested records, but must
specially, the ability to work with extremely large amounts of data in
a very easy way, without having to renounce to first-class speed.

[1] Anything/everything that is physical/virtual, or can be conceived
is hierarchical... if the system itself is not random/chaotic. Thats a
lovely revelation I've had... EVERYTHING is hierarchical. If it has
context it has hierarchy.

While I agree that this sentence has a part of truth, it is also known
that a lot of things (perhaps much more than we think) in the universe
enter directly in the domain of random/chaotic

IMO, the wisest path should be recognizing the strengths (and
weaknesses) of each approach and use whatever fits better to your
needs. If you need the best of both then go ahead and choose a RDBMS
in combination with a hierarchical DB, and utilize the powerful
capabilities of Python to take the most out of them.

Cheers,

Francesc Altet

Call for Papers Reminder (extended): The 2013 InternationalConference of Data Mining and Knowledge E	0	Mar 10, 2013
Good Python IDE	7	Jan 6, 2013
Website data-mining.	5	Aug 4, 2007
My graphics don't look good with my buttons	0	May 20, 2022
How to host data visualization beginner friendly?	1	Aug 10, 2023
Seeking co-founders for my company.	3	Sep 8, 2024
How to return data in specific format from Python Flask API?	0	Aug 10, 2022
How do I solidify my Python skills	1	Sep 15, 2023

Python good for data mining?

Dennis Lee Bieber

Jens

Francesc

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads