SQL and CSV

N

Nick

I have a requirement to read a CSV file. Normally, no problem, just
import CSV and slurp the file up.

However, in this case I want to filter out lines that have fields set
to particular values.

It would be neat to be able to do something like this.

select * from test.csv where status <> "Canceled"

Using adodb I can do this, so long as I don't have the where clause. :-
(

Is there a reasonable lightweight way of doing this in Python?

I could write some python code that is used to filter rows, and inport
that from config, but it's not quite as elegant as an SQL route.

Thanks

Nick
 
T

Tim Golden

Nick said:
I have a requirement to read a CSV file. Normally, no problem, just
import CSV and slurp the file up.

However, in this case I want to filter out lines that have fields set
to particular values.

It would be neat to be able to do something like this.

select * from test.csv where status <> "Canceled"

Using adodb I can do this, so long as I don't have the where clause. :-
(

Is there a reasonable lightweight way of doing this in Python?

I could write some python code that is used to filter rows, and inport
that from config, but it's not quite as elegant as an SQL route.


Not entirely clear what you are and aren't prepared to try here, but...
the most obvious Python-based way to do this is treating the csv reader
as an iterator and filtering there. Your last line suggests that's not
what you want but just in case I've misunderstood:

<test.csv>
id,code,status
1,"ONE","Active"
2,"TWO","Cancelled"
3,"THREE","Active"
</test.csv>

<code>
import csv

for row in csv.DictReader (open ("c:/temp/test.csv", "rb")):
if row['status'] != 'Cancelled':
print row

</code>

Doesn't seem too onerous, and could obviously be wrapped in
some useful class/module.

But if you really want to go the SQL route, I believe there are
ODBC adapters for CSV which, combined with PyODBC or CeODBC,
would probably take you where you want to go.

TJG
 
N

Nick

Nick said:
I have a requirement to read a CSV file. Normally, no problem, just
import CSV and slurp the file up.
However, in this case I want to filter out lines that have fields set
to particular values.
It would be neat to be able to do something like this.
select * from test.csv where status <> "Canceled"
Using adodb I can do this, so long as I don't have the where clause. :-
(
Is there a reasonable lightweight way of doing this in Python?
I could write some python code that is used to filter rows, and inport
that from config, but it's not quite as elegant as an SQL route.

Not entirely clear what you are and aren't prepared to try here, but...
the most obvious Python-based way to do this is treating the csv reader
as an iterator and filtering there. Your last line suggests that's not
what you want but just in case I've misunderstood:

<test.csv>
id,code,status
1,"ONE","Active"
2,"TWO","Cancelled"
3,"THREE","Active"
</test.csv>

<code>
import csv

for row in csv.DictReader (open ("c:/temp/test.csv", "rb")):
  if row['status'] != 'Cancelled':
    print row

</code>

Doesn't seem too onerous, and could obviously be wrapped in
some useful class/module.

But if you really want to go the SQL route, I believe there are
ODBC adapters for CSV which, combined with PyODBC or CeODBC,
would probably take you where you want to go.

TJG

Part of the problem is that the 'selection' needs to be in a config
file. I can put the if row['status'] != 'Cancelled': return True into
a config, read it and eval it, but its not quite as clean as an sql
route.

Nick
 
M

Matimus

Not entirely clear what you are and aren't prepared to try here, but...
the most obvious Python-based way to do this is treating the csv reader
as an iterator and filtering there. Your last line suggests that's not
what you want but just in case I've misunderstood:

<code>
import csv
for row in csv.DictReader (open ("c:/temp/test.csv", "rb")):
  if row['status'] != 'Cancelled':
    print row

Doesn't seem too onerous, and could obviously be wrapped in
some useful class/module.
But if you really want to go the SQL route, I believe there are
ODBC adapters for CSV which, combined with PyODBC or CeODBC,
would probably take you where you want to go.

Part of the problem is that the 'selection' needs to be in a config
file. I can put the if row['status'] != 'Cancelled': return True into
a config, read it and eval it, but its not quite as clean as an sql
route.

Nick

Well, if you are using 2.5.x you could always stuff it into a sqlite
in-memory database, and then execute a SQL query. Heck, you don't even
_need_ 2.5, but in 2.5 sqlite is part of the distribution.

Matt
 
T

Tim Golden

Nick said:
Part of the problem is that the 'selection' needs to be in a config
file. I can put the if row['status'] != 'Cancelled': return True into
a config, read it and eval it, but its not quite as clean as an sql
route.


Still not clear what the restriction is. If you were writing
SQL you'd have to read *something* from your config file,
unless you're suggesting that the "config file" is in fact
a SQL file. Which is one way of doing it, but then you might
just as well have your config file as a Python file and
import it.

Have I missed the point somewhere here? Can you give an
example -- even a fictional one -- of what you couldn't
do using, say, the example I gave earlier?

TJG
 
N

Nick

Nick said:
Part of the problem is that the 'selection' needs to be in a config
file. I can put the if row['status'] != 'Cancelled': return True into
a config, read it and eval it, but its not quite as clean as ansql
route.

Still not clear what the restriction is. If you were writingSQLyou'd have to read *something* from your config file,
unless you're suggesting that the "config file" is in fact
aSQLfile. Which is one way of doing it, but then you might
just as well have your config file as a Python file and
import it.

Have I missed the point somewhere here? Can you give an
example -- even a fictional one -- of what you couldn't
do using, say, the example I gave earlier?

TJG

Solution found. In the end I used SQLite to read from a csv file, and
now I can query the CSV file. The file is read using the csv module

First create a function

def fraction(p, denom):
num, frac = p.split ('-')
return float (num) + float (frac) / denom

for use within queries.

Now build the class.

self.filename = filename
self.dialect = dialect
self.query = query
reader = csv.reader (open (filename, 'r'))
self.connection = sqlite.connect(":memory:")
self.connection.create_function("fraction", 2, fraction) #
Adds in function
self.cursor = self.connection.cursor()
first = True
for row in reader:
if first:
headers = []
for r in row:
n = r.strip().replace (' ', '_').replace ('-','_')
headers.append (n)
command = 'create table csv (%s)' % ','.join (headers)
self.cursor.execute (command)
first = False
else:
command = 'insert into csv values ("%s")' % '","'.join
(row)
self.cursor.execute (command)

and then I can use this

self.cursor.execute (self.query)
rows = self.cursor.fetchall()
headers = []
for r in self.cursor.description:
headers.append (r[0])
results = Results.Results (headers, self.name, {})
i = 0
for row in rows:
results.add (row, i)
i = i + 1
return results

to query the results.

Results.Results is one of my classes that's reused in lots of places.

The query then looks somethign like this

select
Client_Reference_Number as TrdNbr,
Asset_Number as ISIN,
Quantity as Qty,
status
from
csv
where status in ("CANCELLED")

union

select
Client_Reference_Number as TrdNbr,
Asset_Number as ISIN,
Quantity as Qty,
status
from
csv
where status not in ("CANCELLED")


All incredibly neat and the first time I've used SQLite.

nick
 
J

John Machin

Nick said:
Part of the problem is that the 'selection' needs to be in a config
file. I can put the if row['status'] != 'Cancelled': return True into
a config, read it and eval it, but its not quite as clean as ansql
route.
Still not clear what the restriction is. If you were writingSQLyou'd have to read *something* from your config file,
unless you're suggesting that the "config file" is in fact
aSQLfile. Which is one way of doing it, but then you might
just as well have your config file as a Python file and
import it.
Have I missed the point somewhere here? Can you give an
example -- even a fictional one -- of what you couldn't
do using, say, the example I gave earlier?

Solution found. In the end I used SQLite to read from a csv file, and
now I can query the CSV file. The file is read using the csv module

First create a function

def fraction(p, denom):
    num, frac = p.split ('-')
    return float (num) + float (frac) / denom

for use within queries.

Now build the class.

        self.filename  = filename
        self.dialect   = dialect
        self.query     = query
        reader = csv.reader (open (filename, 'r'))
        self.connection = sqlite.connect(":memory:")
        self.connection.create_function("fraction", 2, fraction) #
Adds in function
        self.cursor = self.connection.cursor()
        first = True
        for row in reader:
            if first:
                headers = []
                for r in row:
                    n = r.strip().replace (' ', '_').replace ('-','_')
                    headers.append (n)
                command = 'create table csv (%s)' % ','..join (headers)
                self.cursor.execute (command)
                first = False
            else:
                command = 'insert into csv values ("%s")' % '","'.join
(row)
                self.cursor.execute (command)

and then I can use this

        self.cursor.execute (self.query)
        rows = self.cursor.fetchall()
        headers = []
        for r in self.cursor.description:
            headers.append (r[0])
        results = Results.Results (headers, self.name, {})
        i = 0
        for row in rows:
            results.add (row, i)
            i = i + 1
        return results

to query the results.

Results.Results is one of my classes that's reused in lots of places.

The query then looks somethign like this

                select
                    Client_Reference_Number as TrdNbr,
                    Asset_Number as ISIN,
                    Quantity as Qty,
                    status
                from
                    csv

The remaining lines of your SELECT statement are incredibly redundant
AFAICT. It seems you have pushed the contents of your csv file into a
data base and pulled them ALL out again ... not what I'd call a
"query". What's the point?
 
D

Dennis Lee Bieber

The query then looks somethign like this

select
Client_Reference_Number as TrdNbr,
Asset_Number as ISIN,
Quantity as Qty,
status
from
csv
where status in ("CANCELLED")

union

select
Client_Reference_Number as TrdNbr,
Asset_Number as ISIN,
Quantity as Qty,
status
from
csv
where status not in ("CANCELLED")
Pardon? The union of "in ('CANCELLED')" and "not in ('CANCELLED')"
will be ALL records... -- possibly ordered with the "cancelled" first...

I'd probably just use

select ... from csv
order by status;

to group the values.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
P

Peter Otten

Nick said:
self.cursor = self.connection.cursor()
first = True
for row in reader:
if first:
headers = []
for r in row:
n = r.strip().replace (' ', '_').replace ('-','_')
headers.append (n)
command = 'create table csv (%s)' % ','.join (headers)
self.cursor.execute (command)
first = False
else:
command = 'insert into csv values ("%s")' % '","'.join
(row)
self.cursor.execute (command)

You can simplify that a bit:

cursor = self.cursor = self.connection.cursor()

first_row = next(reader)
headers = [column.strip().replace(" ", "_").replace("-", "_") for column in
first_row]
cursor.execute("create table csv (%s)" % ", ".join(headers))

placeholders = ", ".join("?"*len(headers))
command = "insert into csv values (%s)" % placeholders
cursor.executemany(command, reader)

While it may not matter here using placeholders instead of manually escaping
user-provided values is a good habit to get into.
self.cursor.execute (self.query)
rows = self.cursor.fetchall()

rows = self.cursor.execute(self.query)

doesn't build an intermediate list.
i = 0
for row in rows:
results.add (row, i)
i = i + 1

This is written

for i, row in enumerate(rows):
results.add(row, i)

in idiomatic Python.

Peter
 
L

Lawrence D'Oliveiro

While it may not matter here using placeholders instead of manually
escaping user-provided values is a good habit to get into.

Until you hit things it can't deal with.
 
N

Nick

The post you are replying to was talking about using the SQL library's "?"
syntax that automatically escapes values.  The usual reason this is
recommended (if I have understood correctly) is that the library code is
much more likely to foil injection attacks.  I have seen this mentioned
often and assume it is good advice.

Can you expand on your comment?  I assume you are thinking of how the
library might handle some strange class.  But aren't the number of types
limited by SQL?  In which case a "thing that can't be handled" could
presumably be managed by adding an appropriate __str__ or __float__ or
whatever?  And you would still use the library to give safety with other
values.

Maybe you could give an example of the kind of problem you're thinking of?

Thanks,
Andrew

Injection attacks aren't an issue, its a local app.

It's part of a reconciliation system, where sometimes data is in csv
files. If you want the whole csv file, you can use csv module without
a problem.

In some cases, I need to manipulate the data.

The choices are hard code the manipulation, or load the data from a
config file.

So what I've got is the query in the config and I can process it.

Nick
 
L

Lawrence D'Oliveiro

andrew said:
Can you expand on your comment? I assume you are thinking of how the
library might handle some strange class.

Consider something as simple as a "like" clause.
 
L

Lawrence D'Oliveiro

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top