(in memory) database

M

mark

Hi there,

I need to extract data from text files (~4 GB) on this data some
operations are performed like avg, max, min, group etc. The result is
formated and written in some other text files (some KB).

I currently think about database tools might be suitable for this. I
would just write the import from the text files and ... the tool does
the rest. The only problem I can imagine is that this would not be
fast enough. But I would give it a shoot.
Unfortunately I have only some knowledge of SQLite which is not an
option here.

Some additional requirements I can think of are:
- Python (I want to hone my programming skills too)
- Python-only (no C-lib) for simplicity (installation, portability).
Therefore SQLite is not an option
- must be fast
- I like SQL (select a, b from ...) this would be nice (row[..] + ...
is a little hard getting used to)

So far I found PyDBLite, PyTables, Buzhug but they are difficult to
compare for a beginner.

Cheers,
Mark
 
F

Fredrik Lundh

mark said:
> I need to extract data from text files (~4 GB) on this data some
> operations are performed like avg, max, min, group etc. The result is
> formated and written in some other text files (some KB).

you could probably do all that with data stream processing, but if you
haven't worked with such algorithms, just stuffing it all in a database
is probably less work for you (if not for your CPU).
Unfortunately I have only some knowledge of SQLite which is not an
option here.

why is sqlite not an option? it's is bundled with Python these days,
and should be available (or trivial to install) on all major deployment
platforms.

</F>
 
P

Paul Boddie

Yes and no. My own experience with Debian packages is that with a
standard
apt-get install python2.5
an attempt to
import sqlite3
results in
ImportError: No module named _sqlite3

That's strange from the perspective of the Debian package information:

http://packages.debian.org/etch/python2.5
http://packages.debian.org/lenny/python2.5

Both have libsqlite3-0 as a dependency. On my Ubuntu system, the same
dependency applies.

I'm not sure Novell can help with the matter, though. ;-)
I recognize the error was resolved nearly two years ago,
but I, for one, don't understand how to express the resolution in
terms of Debian packages. Is there a way to install Python and have
it manage SQLite3 correctly withOUT configuring recent sources "by
hand"?

Which Debian version and which package repository? I imagine that
there may have been backports of Python 2.5 to Debian 3.1 (Sarge) and
earlier, but my own experience with sqlite prior to running Python 2.5
on Ubuntu involved use of the pysqlite2 module with Python 2.4
instead. Since Python 2.5 became the default on Ubuntu, I don't recall
having any problems with sqlite.

Paul
 
P

Paul Boddie

Let's take a definite example: I have a convenient
Ubuntu 8.04.1
The content of /etc/apt/sources.list is
debhttp://us.archive.ubuntu.com/ubuntuhardy main restricted
debhttp://us.archive.ubuntu.com/ubuntuhardy-updates main restricted
debhttp://us.archive.ubuntu.com/ubuntuhardy universe multiverse
debhttp://security.ubuntu.com/ubuntuhardy-security main restricted
I do
apt-get update
apt-get upgrade
apt-get install python2.5
then
# python2.5
Python 2.5 (r25:51908, Dec 11 2006, 21:09:56)
[GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.5/sqlite3/__init__.py", line 24, in <module>
from dbapi2 import *
File "/usr/local/lib/python2.5/sqlite3/dbapi2.py", line 27, in <module>
from _sqlite3 import *
ImportError: No module named _sqlite3

How do you interpret this?

What do you get if you run this command...?

dpkg -s python2.5

For me, I get something which mentions the following:

Package: python2.5

[...]

Depends: python2.5-minimal (= 2.5.1-0ubuntu1.2), mime-support,
libbz2-1.0, libc6 (>= 2.5-0ubuntu1), libdb4.4,
libncursesw5 (>= 5.4-5), libreadline5 (>= 5.2),
libsqlite3-0 (>= 3.3.13), libssl0.9.8 (>= 0.9.8c-1)

Note the presence of the libsqlite3-0 package. In addition, you should
have the sqlite3 extension module somewhere:

locate sqlite3.so

This should tell you where the sqlite libraries are as well as where
the extension module is. For me, I get something which includes the
following:

/usr/lib/python2.5/lib-dynload/_sqlite3.so
/usr/lib/libsqlite3.so.0

Passing one of these to "dpkg -S" should say which package provided
it.

The strange thing is that the Ubuntu package information for your
version does mention the sqlite dependency and include the extension
module in the list of files:

http://packages.ubuntu.com/hardy/python2.5

You can run the following command to see whether your python2.5
package really provides the extension module:

dpkg --listfiles python2.5

Even if the sqlite library is installed, if that package doesn't
provide the extension module, something must be wrong with it because
it should be there.

Paul
 
P

Paul Boddie

[Lots of output suggesting correct package configuration]
I'm certainly perplexed, and welcome suggestions.

Maybe...

which python

I think Jean-Paul might be on to something with his response. Are we
referring to the system-packaged Python? There's always "python -v"
and/or "strace python" for full details of what might be happening
otherwise.

Paul
 
C

Cousin Stanley

.... .
Yes and no. My own experience with Debian packages
is that with a standard

apt-get install python2.5

an attempt to
import sqlite3

results in
ImportError: No module named _sqlite3
....

No problems here with Debian Lenny ....

All packages via .... apt-get install xxxx ....

$ uname -a
Linux em1 2.6.25-2-686 #1 SMP Fri Jul 18 17:46:56 UTC 2008 i686 GNU/Linux

$ dpkg -l | grep sqlite
ii libhk-classes-sqlite3 0.8.3-4 SQLite 3 driver plugin for hk_classes
ii libsqlite3-0 3.5.9-3 SQLite 3 shared library
ii python-pysqlite2 2.4.1-1 Python interface to SQLite 3
ii sqlite3 3.5.9-3 A command line interface for SQLite 3

$ py
Python 2.5.2 (r252:60911, Aug 8 2008, 09:22:44)
[GCC 4.3.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
 
C

Cousin Stanley

....
Yes and no. My own experience with Debian packages
is that with a standard
apt-get install python2.5
an attempt to
import sqlite3
results in
ImportError: No module named _sqlite3
....

From Kubuntu 8.04 ....

$ uname -a
Linux em1 2.6.24-19-generic #1 SMP
Wed Aug 20 22:56:21 UTC 2008 i686 GNU/Linux

$ dpkg -l | grep sqlite
ii libsqlite0 2.8.17-4build1 SQLite shared library
ii libsqlite3-0 3.4.2-2 SQLite 3 shared library

$ py25
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
It is now my estimation that the Force
is not currently with you .... :)
 
B

Bruno Desthuilliers

mark a écrit :
Hi there,

I need to extract data from text files (~4 GB) on this data some
operations are performed like avg, max, min, group etc. The result is
formated and written in some other text files (some KB).

I currently think about database tools might be suitable for this. I
would just write the import from the text files and ... the tool does
the rest. The only problem I can imagine is that this would not be
fast enough.

Is this an a priori, or did you actually benchmark and found out it
would not fit your requirements ?
But I would give it a shoot.
Unfortunately I have only some knowledge of SQLite which is not an
option here.
>
Some additional requirements I can think of are:
- Python (I want to hone my programming skills too)
- Python-only (no C-lib) for simplicity (installation, portability).
Therefore SQLite is not an option
- must be fast

These two requirements can conflict for some values of "fast".
- I like SQL (select a, b from ...) this would be nice (row[..] + ...
is a little hard getting used to)

So far I found PyDBLite, PyTables, Buzhug but they are difficult to
compare for a beginner.

Never used any of them - I have sqlite, mysql and pgsql installed on all
my machines -, so I can't help here.
 
M

M.-A. Lemburg

Hi there,

I need to extract data from text files (~4 GB) on this data some
operations are performed like avg, max, min, group etc. The result is
formated and written in some other text files (some KB).

I currently think about database tools might be suitable for this. I
would just write the import from the text files and ... the tool does
the rest. The only problem I can imagine is that this would not be
fast enough. But I would give it a shoot.
Unfortunately I have only some knowledge of SQLite which is not an
option here.

Some additional requirements I can think of are:
- Python (I want to hone my programming skills too)
- Python-only (no C-lib) for simplicity (installation, portability).
Therefore SQLite is not an option
- must be fast
- I like SQL (select a, b from ...) this would be nice (row[..] + ...
is a little hard getting used to)

So far I found PyDBLite, PyTables, Buzhug but they are difficult to
compare for a beginner.

You could use Gadfly for this since it is pure Python and provides
a standard Python DB-API interface:

http://gadfly.sourceforge.net/

(the C extensions are optional to speedup processing)

This is the SQL subset it supports:

http://gadfly.sourceforge.net/sql.html

Another option is SnakeSQL:

http://pythonweb.org/projects/snakesql/

but I've never used that one, so can't judge its quality.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Sep 01 2008)________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
 
Z

Zentrader

On Ubuntu you want to install something like python-sqlite (a search
for "python" should turn up everything). There are 2 parts to this,
SQLite and the python bindings to SQLite. So you seem to have SQLite
installed but not the Python bindings. Also, on some systems you have
to have python-sqlite installed, and then build python in order to use
it. I'm on Gentoo so I can't help you with Ubuntu, but here's
something I had lying around that uses a DB in memory (no idea why I
did this or if it makes sense but you're welcome to it
import sqlite3 as sqlite

..##----------------------------------------------------------------------
..def add_rec(cur, con, add_tuple):
.. print "add_rec", len(add_tuple)
.. cur.execute("insert into test_it values
(?, ?, ?, ?, ?, ?, ?, ?, ?)", add_tuple)
.. con.commit()
..
..##----------------------------------------------------------------------
..def print_all_recs(cur):
.. # Execute the SELECT statement:
.. print "Printing all recs"
.. cur.execute("select * from test_it")
..
.. # Retrieve all rows as a sequence and print that sequence:
.. recs_list = cur.fetchall()
.. for rec in recs_list:
.. print rec
.. print " stage, REAC_W =", rec[1], rec[5]
..
..##----------------------------------------------------------------------
..def add_test_data(cur, con):
.. # Stage REAC_PS WR(T)/P TTO
REAC_W W0 0wR(THCR)E/D PTO
.. data_list=[ ('2.0', 'Stage1', '0.509', '4.3010', '1602.30',
'0.515', '3.191', '2.8191', '29.7010'), \
.. ('2.0', 'Stage2', '0.488', '6.0074', '1470.43',
'0.500', '3.200', '3.9309', '20.4275'), \
.. ('2.0', 'Stage1', '0.524', '4.4623', '1602.30',
'0.560', '3.311', '2.9243', '29.7010'), \
.. ('2.0', 'Stage2', '0.579', '6.6682', '1444.78',
'0.700', '3.320', '4.3593', '18.9262'), \
.. ('3.0', 'Stage1', '0.524', '4.4623', '1602.30',
'0.560', '3.311', '2.9243', '29.7010'), \
.. ('3.0', 'Stage2', '0.579', '6.6682', '1444.78',
'0.700', '3.320', '4.3593', '18.9262'), \
.. ('3.5', 'Stage1', '0.525', '4.4695', '1602.30',
'0.563', '3.316', '2.9290', '29.7010') ]
..
.. for data_tuple in data_list :
.. add_rec(cur, con, data_tuple)
..
..##----------------------------------------------------------------------
..if __name__ == "__main__":
.. # Create a connection to the (memory) database file
.. con = sqlite.connect(':memory:')
..
.. # Get a Cursor object that operates in the context of Connection
con
.. cur = con.cursor()
..
.. cur.execute("CREATE TABLE test_it (number varchar, stage varchar,
REAC_PS varchar, WR_T_P varchar, TTO varchar, REAC_W varchar, W0
varchar, wR_THCR_E_D varchar, PTO varchar)")
..
.. add_test_data(cur, con)
.. print_all_recs(cur)
..
.##----------------------------------------------------------------------
.. print '\n-----SELECT * FROM test_it where number="2.0"-------'
.. recs_list=cur.execute('SELECT * FROM test_it where number="2.0"')
.. ctr=0
.. for row in recs_list:
.. print row
.. ctr += 1
.. print ctr, "recs found"
..
.. print '\n-----SELECT * FROM test_it where number="2.0" and
stage="Stage1"'
.. lookup_dic={"dic_num":"2.0", "dic_st":"Stage1"}
.. recs_list=cur.execute('SELECT * FROM test_it where number=:dic_num
and stage=:dic_st', \
.. lookup_dic)
.. ctr=0
.. for row in recs_list:
.. print row
.. ctr += 1
.. print ctr, "recs found"
 
P

Paul Boddie

On Ubuntu you want to install something like python-sqlite (a search
for "python" should turn up everything).  There are 2 parts to this,
SQLite and the python bindings to SQLite.  So you seem to have SQLite
installed but not the Python bindings.  Also, on some systems you have
to have python-sqlite installed, and then build python in order to use
it.

The python-sqlite and python-sqlite2 packages were needed for Python
2.4 - it was possibly python2.4-sqlite or python2.4-sqlite2 if you go
back a few versions - but Python 2.5 should depend on the sqlite
libraries and pull them in automatically:

http://packages.ubuntu.com/hardy/python2.5

I don't understand why Cameron has a different version of Python which
doesn't seem to have sqlite support enabled.

Paul
 
Z

Zentrader

I don't understand why Cameron has a different version of Python which
doesn't seem to have sqlite support enabled.

Agreed, but won't the package manager tell him if python-sqlite is
installed? That would be the next step since it appears that SQLite
intself is already installed. Since Ubuntu uses precompied binaries,
Python should be configured for SQLite which again leaves no python-
sqlite as the only possibility (yeah right). BTW Python is easy to
install manually.
 
P

Peter Otten

Zentrader said:
Agreed, but won't the package manager tell him if python-sqlite is
installed? That would be the next step since it appears that SQLite
intself is already installed. Since Ubuntu uses precompied binaries,
Python should be configured for SQLite which again leaves no python-
sqlite as the only possibility (yeah right). BTW Python is easy to
install manually.

When you install Python manually from source you need the header files for
sqlite3 to get sqlite3 support. These are in the libsqlite3-dev package.

I think you can distinguish a manually installed python from the packaged
one by the .../local/... in its path, e. g., on my machine

$ which python2.5 # in the distribution
/usr/bin/python2.5
$ which python2.6
/usr/local/bin/python2.6 # installed from source

I have installed libsqlite3-dev so I can't reproduce Cameron's error, but
here's a similar one for bsddb:

$ python2.5
Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:43)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.'/usr/lib/python2.5/bsddb/__init__.pyc'

$ python2.6
Python 2.6b2+ (trunk:65902, Aug 20 2008, 08:38:26)
[GCC 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.6/bsddb/__init__.py", line 58, in <module>
import _bsddb
ImportError: No module named _bsddb

Peter

PS: Yes, I'm using 2.6, but I don't think that's relevant for the problem.
 
P

Paul Boddie

Agreed, but won't the package manager tell him if python-sqlite is
installed?

It shouldn't need to be installed: the python2.5 package includes the
sqlite3 module and the _sqlite extension module. He's running a more
modern version of Ubuntu than I am, but I don't think that they've
reintroduced the python-sqlite package in any form.
That would be the next step since it appears that SQLite
intself is already installed. Since Ubuntu uses precompied binaries,
Python should be configured for SQLite which again leaves no python-
sqlite as the only possibility (yeah right). BTW Python is easy to
install manually.

Indeed, which is why I think that there must be a manually installed
Python on his system, especially given that /usr/local/lib/python2.5/
sqlite3/__init__.py is one of the files mentioned in the traceback.

Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top