Python plain-text database or library that supports joins?

felciano · Jun 22, 2007

Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Conceptually this is a simple database operation with a join on a
common field (zip code in the above example). Other case use other
relational operators (projection, cross-product, etc) so I'm really
looking for something SQL-like in functionality. However, the data is
in flat-files, the file structure changes frequently, the files are
dynamically generated from a range of sources, are short-lived in
nature, and otherwise not warrant the hassle of a database setup. So
I've been looking around for a nice, Pythonic, zero-config (no
parsers, no setup/teardown, etc) solution for simple queries that
handles a database of csv-files-with-headers automatically. There are
number of solutions that are close, but in the end come up short:

- KirbyBase 1.9 (latest Python version) is the closest that I could
find, as it lets you keep your data in flatfiles and perform
operations using the field names from those text-based tables, but it
doesn't support joins (the more recent Ruby version seems to).
- Buzhug and Sqlite have their data structures w no automatic .tab
or .csv parsing (unless sqlite includes a way to map flatfiles to
sqlite virtual tables that I don't know about).
- http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/159974 is
heading in the right direction, as it shows how to perform relational
operations on lists and are index based rather than field-name based.
- http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498130 and
http://furius.ca/pubcode/pub/conf/common/bin/csv-db-import.html
provide ways of automatically populating DBs but not the reverse
(persist changes back out to the data files)

The closest alternatives I've found are the GNU textutils that support
join, cut, merge, etc but I need to add additional logic they don't
support, nor do they allow field-level write operations from Python
(UPDATE ... WHERE ...). Normally I'd jump right in and start coding
but this seems like something so common that I would have expected
someone else to have solved, so in the interest of not re-inventing
the wheel I thought I'd see if anyone had any other suggestions. Any
thoughts?

Thanks!

Ramon

askel · Jun 22, 2007

Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Conceptually this is a simple database operation with a join on a
common field (zip code in the above example). Other case use other
relational operators (projection, cross-product, etc) so I'm really
looking for something SQL-like in functionality. However, the data is
in flat-files, the file structure changes frequently, the files are
dynamically generated from a range of sources, are short-lived in
nature, and otherwise not warrant the hassle of a database setup. So
I've been looking around for a nice, Pythonic, zero-config (no
parsers, no setup/teardown, etc) solution for simple queries that
handles a database of csv-files-with-headers automatically. There are
number of solutions that are close, but in the end come up short:

- KirbyBase 1.9 (latest Python version) is the closest that I could
find, as it lets you keep your data in flatfiles and perform
operations using the field names from those text-based tables, but it
doesn't support joins (the more recent Ruby version seems to).
- Buzhug and Sqlite have their data structures w no automatic .tab
or .csv parsing (unless sqlite includes a way to map flatfiles to
sqlite virtual tables that I don't know about).
-http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/159974is
heading in the right direction, as it shows how to perform relational
operations on lists and are index based rather than field-name based.
-http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/498130andhttp://furius.ca/pubcode/pub/conf/common/bin/csv-db-import.html
provide ways of automatically populating DBs but not the reverse
(persist changes back out to the data files)

The closest alternatives I've found are the GNU textutils that support
join, cut, merge, etc but I need to add additional logic they don't
support, nor do they allow field-level write operations from Python
(UPDATE ... WHERE ...). Normally I'd jump right in and start coding
but this seems like something so common that I would have expected
someone else to have solved, so in the interest of not re-inventing
the wheel I thought I'd see if anyone had any other suggestions. Any
thoughts?

Thanks!

Ramon

ramon,

i don't think that using flat text files as a database is common these
days. if you need relational database features what stops you from
using rdbms? if the only reason for that is some legacy system then
i'd still use in-memory sqlite database for all relational operations.
import, process, export back to text if you need to.

felciano · Jun 23, 2007

i don't think that using flat text files as a database is common these
days. if you need relational database features what stops you from
using rdbms? if the only reason for that is some legacy system then
i'd still use in-memory sqlite database for all relational operations.
import, process, export back to text if you need to.

These are often one-off operations, so those import + export steps are
non-trivial overhead. For example, most log files are structured, but
it seems like we still use scripts or command line tools to find data
in those files. I'm essentially doing the same thing, only with
operations across multiple files (e.g. merge records these two files
based on a common key, or append a column based on a look up value). I
may end up having to go to DB, but that seems like a heavyweight jump
for what are otherwise simple operations.

Maybe this is the wrong forum for the question. I prefer programming
in Python, but the use cases I'm looking is closer to shell scripting.
I'd be perfectly happy with a more powerful version of GNU textutils
that allowed for greater flexibility in text manipulation.

HTH,

Ramon

Alan Isaac · Jun 23, 2007

Not Python, but maybe relevant:
http://www.scriptaworks.com/cgi-bin/wiki.sh/NoSQL/HomePage

Alan Isaac

Michele Simionato · Jun 23, 2007

Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Have you looked at itools?

http://www.ikaaro.org/itools#itools.csv

HTH,

Michele Simionato

Joshua J. Kugler · Jun 26, 2007

Hello --

Is there a convention, library or Pythonic idiom for performing
lightweight relational operations on flatfiles? I frequently find
myself writing code to do simple SQL-like operations between flat
files, such as appending columns from one file to another, linked
through a common id. For example, take a list of addresses and append
a 'district' field by looking up a congressional district from a
second file that maps zip codes to districts.

Two pointers, but maybe not a complete solution:

http://search.cpan.org/dist/DBD-Sprite/
Perl library that uses CSV files and supports simple joins. Maybe a port of
this?

http://www.biostat.wisc.edu/~annis/creations/pseudb.html
Functional interface for CSV files inspired by Sprite, but does not support
joins. Possibly could be extended?

j

Python database of plain text editable by notepad or vi	7	Mar 25, 2010
Python Internet Database	5	May 9, 2014
ANN: eGenix mxODBC Connect 2.1.0 - Python ODBC Database Interface	0	May 28, 2014
[ANN] Struqtural: High level database interface library	3	Jul 17, 2010
ANN: dulce 0.1 - in-memory schema-less relational database	0	Oct 18, 2010
Database usage best practices	11	Sep 26, 2012
(in memory) database	13	Aug 31, 2008
ANN: eGenix mxODBC Connect 1.0.1 - Python Database Interface	0	Mar 19, 2009

Python plain-text database or library that supports joins?

felciano

askel

felciano

Alan Isaac

Michele Simionato

Joshua J. Kugler

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads