Formatting question.

M

mike5160

Hi all,

My input file looks like this : ( the data is separated by tabs )

11/26/2007 56.366 898.90 -10.086 23.11 1212.3
11/26/2007 52.25 897.6 -12.5 12.6 13333.5
11/26/2007 52.25 897.6 -12.5 12.6 133.5

The output I'm trying to get is as follows :

( Insert NBUSER.Grochamber Values
'11/26/2007','56.366','898.90','-10.086','23.11','1212.3', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','13333.5', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','133.5', )

The following is the program i have written so far :


LoL = []

for line in open('mydata.txt'):
LoL.append(line.split("\t"))

print "read from a file: ", LoL,

outfile = open("out.dat", "w")

lilength = len(LoL)
liwidelength = len(LoL[1])


print "length of list is " , lilength, "long"
print "length of list is " , liwidelength, "long"

for x in range(lilength):
outfile.write(" ( ")
outfile.write('Insert NBUSER.Grochamber Values ')
for y in range(liwidelength):
outfile.write( "'%s'," % (LoL[x][y]))
outfile.write(" ) \n")

outfile.close()


I have 3 questions :

1. The formatting in the first line comes out wrong all the time. I m
using windows python 2.5.1. The last part of the first line is always
on the second line.

2. How do I avoid the "," symbol after the last entry in the line?
(this are supposed to be sql-queries - importing excel based tabbed
data to sql database)

3. What do I do when the data is missing? Like missing data?

Thanks for all your help!

Mike
 
S

Sergio Correia

Hey Mike,
Welcome to Python!

About your first issue, just change the line
outfile.write( "'%s'," % (LoL[x][y]))
With
outfile.write( "'%s'," % (LoL[x][y][:-1]))

Why? Because when you do the line.split, you are including the '\n' at
the end, so a new line is created.

Now, what you are doing is not very pythonic (batteries are included
in python, so you could just use the CSV module). Also, the for x in
range(len(somelist)) is not recommended, you can just do something
like:

========================
import csv

infile = open("mydata.txt", "rb")
outfile = open("out.txt", "wb")

reader = csv.reader(infile, delimiter='\t')
writer = csv.writer(outfile, quotechar=None, delimiter = "\\")

for row in reader:
data = "'" + "', '".join(row) + "'"
base = " ( Insert NBUSER.Grochamber Values %s, )"
writer.writerow([base % data])

infile.close()
outfile.close()
========================
The above lines works like your program, writing exactly what you asked.
Again, all lists are iterable, you don't need to iterate an integer
from 1 to len(list). (isn't python wonderful?)

HTH,
Sergio
 
D

Dennis Lee Bieber

Hi all,

My input file looks like this : ( the data is separated by tabs )

11/26/2007 56.366 898.90 -10.086 23.11 1212.3
11/26/2007 52.25 897.6 -12.5 12.6 13333.5
11/26/2007 52.25 897.6 -12.5 12.6 133.5

The output I'm trying to get is as follows :

( Insert NBUSER.Grochamber Values
'11/26/2007','56.366','898.90','-10.086','23.11','1212.3', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','13333.5', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','133.5', )
said:
2. How do I avoid the "," symbol after the last entry in the line?
(this are supposed to be sql-queries - importing excel based tabbed
data to sql database)
If those are SQL inserts, the ( is in the wrong place...

insert into NBUSER.Grochamber values (v1, v2, ... , vx)
3. What do I do when the data is missing? Like missing data?

First, for reading the file, recommend you look at the CSV module,
which can be configured to use TABS rather than COMMAS.

For SQL -- if you are going to be writing raw text strings to an
output file for later batching, YOU are going to have to supply some
means to properly escape the data. The better way is to have the program
connect to the database, using the applicable database adapter: MySQLdb
for MySQL, pysqlite2 (or some variant) for SQLite3, some generic ODBC
adapter if going that route... Let IT do the escaping.

Now, since MySQLdb just happens to expose the escaping function, AND
just uses %s formatting of the results, one could easily get stuff to
write to a file.

import MySQLdb
con = MySQLdb.connect(host="localhost", user="test", passwd="test", db="test")
data = [ "11/26/2007 56.366 898.90 -10.086 23.11 1212.3",
.... "11/26/2007 897.6 O'Reilly 12.6 13333.5",
.... "11/26/2007 52.25 897.6 -12.5 12.6 133.5" ]

Note how I left out a field (two tabs, nothing between), and how I
put in a data item with a ' in it.
.... flds = ln.split("\t")
.... placeholders = ", ".join(["%s"] * len(flds))
.... sql = BASE % placeholders
.... sql = sql % con.literal(flds)
.... print sql
....
insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90',
'-10.086', '23.11', '1212.3')
insert into NBUSER.Grochamber values ('11/26/2007', '', '897.6',
'O\'Reilly', '12.6', '13333.5')
insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6',
'-12.5', '12.6', '133.5')
Note how the empty field is just '' (If you really need a NULL,
you'll have to do some games to put a Python None entity into that empty
string field). Also note how the single quote string value has been
escaped.

Something like this for NULL in STRING DATA -- if a field were
numeric 0 it would get substituted with a NULL too...
.... flds = ln.split("\t")
.... placeholders = ", ".join(["%s"] * len(flds))
.... sql = BASE % placeholders
.... flds = [(fld or None) for fld in flds]
.... sql = sql % con.literal(flds)
.... print sql
....
insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90',
'-10.086', '23.11', '1212.3')
insert into NBUSER.Grochamber values ('11/26/2007', NULL, '897.6',
'O\'Reilly', '12.6', '13333.5')
insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6',
'-12.5', '12.6', '133.5')
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
M

mike5160

Hey Mike,
Welcome to Python!

About your first issue, just change the line
outfile.write( "'%s'," % (LoL[x][y]))
With
outfile.write( "'%s'," % (LoL[x][y][:-1]))

Why? Because when you do the line.split, you are including the '\n' at
the end, so a new line is created.

Now, what you are doing is not very pythonic (batteries are included
in python, so you could just use the CSV module). Also, the for x in
range(len(somelist)) is not recommended, you can just do something
like:

========================
import csv

infile = open("mydata.txt", "rb")
outfile = open("out.txt", "wb")

reader = csv.reader(infile, delimiter='\t')
writer = csv.writer(outfile, quotechar=None, delimiter = "\\")

for row in reader:
data = "'" + "', '".join(row) + "'"
base = " ( Insert NBUSER.Grochamber Values %s, )"
writer.writerow([base % data])

infile.close()
outfile.close()
========================
The above lines works like your program, writing exactly what you asked.
Again, all lists are iterable, you don't need to iterate an integer
from 1 to len(list). (isn't python wonderful?)

HTH,
Sergio

My input file looks like this : ( the data is separated by tabs )
11/26/2007 56.366 898.90 -10.086 23.11 1212.3
11/26/2007 52.25 897.6 -12.5 12.6 13333.5
11/26/2007 52.25 897.6 -12.5 12.6 133.5
The output I'm trying to get is as follows :
( Insert NBUSER.Grochamber Values
'11/26/2007','56.366','898.90','-10.086','23.11','1212.3', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','13333.5', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','133.5', )
The following is the program i have written so far :
for line in open('mydata.txt'):
LoL.append(line.split("\t"))
print "read from a file: ", LoL,
outfile = open("out.dat", "w")
lilength = len(LoL)
liwidelength = len(LoL[1])
print "length of list is " , lilength, "long"
print "length of list is " , liwidelength, "long"
for x in range(lilength):
outfile.write(" ( ")
outfile.write('Insert NBUSER.Grochamber Values ')
for y in range(liwidelength):
outfile.write( "'%s'," % (LoL[x][y]))
outfile.write(" ) \n")
outfile.close()

I have 3 questions :
1. The formatting in the first line comes out wrong all the time. I m
using windows python 2.5.1. The last part of the first line is always
on the second line.
2. How do I avoid the "," symbol after the last entry in the line?
(this are supposed to be sql-queries - importing excel based tabbed
data to sql database)
3. What do I do when the data is missing? Like missing data?
Thanks for all your help!

HI Sergio,

First of all, thanks for your reply and yes I'm new to Python.
Did a google on CSV and I am reading the documentation about it right
now. In the post I mentioned I was using Windows. I also have a laptop
with linux installed on it. When I ran the same program on my linux
laptop I did see the \n included in the list. Somehow, I did not see
it on windows, or missed it. So that cleared up the first problem.
Also, I will be doing a lot of this data importing from excel etc. can
you point me to a tutorial/document/book etc. where I can find
snippets of using various python utilities. For eg. something which
has the sample for using "line.split("\t") " or "
outfile.write( "'%s'," % (LoL[x][y][:-1])) " , explaining the various
options available. The default "Idle gui help" is not too informative
to a newbie like me.

Thanks again for your reply,
Mike.
 
M

mike5160

My input file looks like this : ( the data is separated by tabs )
11/26/2007 56.366 898.90 -10.086 23.11 1212.3
11/26/2007 52.25 897.6 -12.5 12.6 13333.5
11/26/2007 52.25 897.6 -12.5 12.6 133.5
The output I'm trying to get is as follows :
( Insert NBUSER.Grochamber Values
'11/26/2007','56.366','898.90','-10.086','23.11','1212.3', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','13333.5', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','133.5', )

2. How do I avoid the "," symbol after the last entry in the line?
(this are supposed to be sql-queries - importing excel based tabbed
data to sql database)

If those are SQL inserts, the ( is in the wrong place...

insert into NBUSER.Grochamber values (v1, v2, ... , vx)
3. What do I do when the data is missing? Like missing data?

First, for reading the file, recommend you look at the CSV module,
which can be configured to use TABS rather than COMMAS.

For SQL -- if you are going to be writing raw text strings to an
output file for later batching, YOU are going to have to supply some
means to properly escape the data. The better way is to have the program
connect to the database, using the applicable database adapter: MySQLdb
for MySQL, pysqlite2 (or some variant) for SQLite3, some generic ODBC
adapter if going that route... Let IT do the escaping.

Now, since MySQLdb just happens to expose the escaping function, AND
just uses %s formatting of the results, one could easily get stuff to
write to a file.
import MySQLdb
con = MySQLdb.connect(host="localhost", user="test", passwd="test", db="test")
data = [ "11/26/2007 56.366 898.90 -10.086 23.11 1212.3",

... "11/26/2007 897.6 O'Reilly 12.6 13333.5",
... "11/26/2007 52.25 897.6 -12.5 12.6 133.5" ]

Note how I left out a field (two tabs, nothing between), and how I
put in a data item with a ' in it.

... flds = ln.split("\t")
... placeholders = ", ".join(["%s"] * len(flds))
... sql = BASE % placeholders
... sql = sql % con.literal(flds)
... print sql
...
insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90',
'-10.086', '23.11', '1212.3')
insert into NBUSER.Grochamber values ('11/26/2007', '', '897.6',
'O\'Reilly', '12.6', '13333.5')
insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6',
'-12.5', '12.6', '133.5')



Note how the empty field is just '' (If you really need a NULL,
you'll have to do some games to put a Python None entity into that empty
string field). Also note how the single quote string value has been
escaped.

Something like this for NULL in STRING DATA -- if a field were
numeric 0 it would get substituted with a NULL too...

... flds = ln.split("\t")
... placeholders = ", ".join(["%s"] * len(flds))
... sql = BASE % placeholders
... flds = [(fld or None) for fld in flds]
... sql = sql % con.literal(flds)
... print sql
...
insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90',
'-10.086', '23.11', '1212.3')
insert into NBUSER.Grochamber values ('11/26/2007', NULL, '897.6',
'O\'Reilly', '12.6', '13333.5')
insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6',
'-12.5', '12.6', '133.5')



--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/98

Hi Dennis,

Thanks to you for your reply. I am a newbie to Python and appreciate
you helping me. Now, I am importing data from an excel sheet and
getting it ready for a derby database. I am to use netbeans, since our
research team uses that. However, derby database uses sql entries to
update the database. And I m trying to format all the excel data I
have, which I got from using labview. I suggested that we use awk/perl/
python etc. and finally after looking at the documentation available I
figured Python would be best. However, (see my reply above) I am
looking for a sample book/document etc. somebody suggested we try
Python Phrasebook. But that one covers a lot of different fields
whereas for my purposes I need a book with examples on using Python in
the above manner. If you or anybody knows about this kind of book
please let me know.

Thank you very much for your help,
Mike.
 
M

mike5160

If those are SQL inserts, the ( is in the wrong place...
insert into NBUSER.Grochamber values (v1, v2, ... , vx)
First, for reading the file, recommend you look at the CSV module,
which can be configured to use TABS rather than COMMAS.
For SQL -- if you are going to be writing raw text strings to an
output file for later batching, YOU are going to have to supply some
means to properly escape the data. The better way is to have the program
connect to the database, using the applicable database adapter: MySQLdb
for MySQL, pysqlite2 (or some variant) for SQLite3, some generic ODBC
adapter if going that route... Let IT do the escaping.
Now, since MySQLdb just happens to expose the escaping function, AND
just uses %s formatting of the results, one could easily get stuff to
write to a file.
import MySQLdb
con = MySQLdb.connect(host="localhost", user="test", passwd="test", db="test")
data = [ "11/26/2007 56.366 898.90 -10.086 23.11 1212.3",
... "11/26/2007 897.6 O'Reilly 12.6 13333.5",
... "11/26/2007 52.25 897.6 -12.5 12.6 133.5" ]
Note how I left out a field (two tabs, nothing between), and how I
put in a data item with a ' in it.
... flds = ln.split("\t")
... placeholders = ", ".join(["%s"] * len(flds))
... sql = BASE % placeholders
... sql = sql % con.literal(flds)
... print sql
...
insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90',
'-10.086', '23.11', '1212.3')
insert into NBUSER.Grochamber values ('11/26/2007', '', '897.6',
'O\'Reilly', '12.6', '13333.5')
insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6',
'-12.5', '12.6', '133.5')
Note how the empty field is just '' (If you really need a NULL,
you'll have to do some games to put a Python None entity into that empty
string field). Also note how the single quote string value has been
escaped.
Something like this for NULL in STRING DATA -- if a field were
numeric 0 it would get substituted with a NULL too...
... flds = ln.split("\t")
... placeholders = ", ".join(["%s"] * len(flds))
... sql = BASE % placeholders
... flds = [(fld or None) for fld in flds]
... sql = sql % con.literal(flds)
... print sql
...
insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90',
'-10.086', '23.11', '1212.3')
insert into NBUSER.Grochamber values ('11/26/2007', NULL, '897.6',
'O\'Reilly', '12.6', '13333.5')
insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6',
'-12.5', '12.6', '133.5')
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/98143

Hi Dennis,

Thanks to you for your reply. I am a newbie to Python and appreciate
you helping me. Now, I am importing data from an excel sheet and
getting it ready for a derby database. I am to use netbeans, since our
research team uses that. However, derby database uses sql entries to
update the database. And I m trying to format all the excel data I
have, which I got from using labview. I suggested that we use awk/perl/
python etc. and finally after looking at the documentation available I
figured Python would be best. However, (see my reply above) I am
looking for a sample book/document etc. somebody suggested we try
Python Phrasebook. But that one covers a lot of different fields
whereas for my purposes I need a book with examples on using Python in
the above manner. If you or anybody knows about this kind of book
please let me know.

Thank you very much for your help,
Mike.

Oops! Sorry I did not know what I did , but I just noticed that I
changed the subject of the Discussion twice. I just want every body to
know that it was unintentional.

Thanks,
Mike.
 
D

Dennis Lee Bieber

Thanks to you for your reply. I am a newbie to Python and appreciate
you helping me. Now, I am importing data from an excel sheet and
getting it ready for a derby database. I am to use netbeans, since our
research team uses that. However, derby database uses sql entries to
update the database. And I m trying to format all the excel data I
have, which I got from using labview. I suggested that we use awk/perl/
python etc. and finally after looking at the documentation available I
figured Python would be best. However, (see my reply above) I am
looking for a sample book/document etc. somebody suggested we try
Python Phrasebook. But that one covers a lot of different fields
whereas for my purposes I need a book with examples on using Python in
the above manner. If you or anybody knows about this kind of book
please let me know.
Unfortunately, you probably won't find any single book...

Parsing fixed format (if still variable line length) text files is
simplified by Python's string.split() and slicing, but those are just
built-in functions for simple algorithms that are language independent.
You might find it under the term "tokenizing"

Formatting SQL statements is... SQL... a totally separate language,
hypothetically standardized but having lots of DBMS specific dialects.

Also, you appear to be looking at it from the direction of
translating tab separated output file from Excel into a sequence of SQL
insert statements which will be written to another file, then "batched"
into some DBMS command line interpreter. That means that you will have
to be responsible for knowing how to escape special characters, properly
indicating nulls, etc.

Presuming http://db.apache.org/derby/ is the DBMS you mention, I
wonder if you would not be better off converting the Excel data into an
XML file of the type wanted by
http://db.apache.org/derby/integrate/db_ddlutils.html

Otherwise, I'm afraid to say, I'd suggest coding the Excel parser
/in/ Java, and use JDBC to directly insert the data... (If there were an
ODBC compatible driver, I'd suggest using a Python ODBC adapter and
doing it from Python).

If using the "ij" utility from a command line, please note that it
supports multiple record insert; instead of

insert into <table> values (a, ..., z);
insert into <table> values (a2, ..., z2);
....
insert into <table> values (aX, ..., zX);

you can use

insert into <table> values
(a, ..., z),
(a2, ..., z2),
....
(aX, ..., zX);

though there may be a limit to how long the statement can be -- maybe
run in batches of 25-50 records at a time...





Thank you very much for your help,
Mike.
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/
 
M

M.-A. Lemburg

Dennis said:
Unfortunately, you probably won't find any single book...

Parsing fixed format (if still variable line length) text files is
simplified by Python's string.split() and slicing, but those are just
built-in functions for simple algorithms that are language independent.
You might find it under the term "tokenizing"

Formatting SQL statements is... SQL... a totally separate language,
hypothetically standardized but having lots of DBMS specific dialects.

Also, you appear to be looking at it from the direction of
translating tab separated output file from Excel into a sequence of SQL
insert statements which will be written to another file, then "batched"
into some DBMS command line interpreter. That means that you will have
to be responsible for knowing how to escape special characters, properly
indicating nulls, etc.

Presuming http://db.apache.org/derby/ is the DBMS you mention, I
wonder if you would not be better off converting the Excel data into an
XML file of the type wanted by
http://db.apache.org/derby/integrate/db_ddlutils.html

Otherwise, I'm afraid to say, I'd suggest coding the Excel parser
/in/ Java, and use JDBC to directly insert the data... (If there were an
ODBC compatible driver, I'd suggest using a Python ODBC adapter and
doing it from Python).

FYI: There is an Excel ODBC driver for Windows which is included in
the Microsoft MDAC package. Using it, you can query Excel tables
with SQL. mxODBC works great with it. OTOH, if you're on Windows
anyway, you can also use the win32 Python package and then tap
directly into Excel using COM.
If using the "ij" utility from a command line, please note that it
supports multiple record insert; instead of

insert into <table> values (a, ..., z);
insert into <table> values (a2, ..., z2);
...
insert into <table> values (aX, ..., zX);

you can use

insert into <table> values
(a, ..., z),
(a2, ..., z2),
...
(aX, ..., zX);

though there may be a limit to how long the statement can be -- maybe
run in batches of 25-50 records at a time...

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Nov 22 2007)________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,252
Messages
2,571,267
Members
47,908
Latest member
MagdalenaR

Latest Threads

Top