Formatting question.

mike5160 · Nov 20, 2007

Hi all,

My input file looks like this : ( the data is separated by tabs )

11/26/2007 56.366 898.90 -10.086 23.11 1212.3
11/26/2007 52.25 897.6 -12.5 12.6 13333.5
11/26/2007 52.25 897.6 -12.5 12.6 133.5

The output I'm trying to get is as follows :

( Insert NBUSER.Grochamber Values
'11/26/2007','56.366','898.90','-10.086','23.11','1212.3', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','13333.5', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','133.5', )

The following is the program i have written so far :

LoL = []

for line in open('mydata.txt'):
LoL.append(line.split("\t"))

print "read from a file: ", LoL,

outfile = open("out.dat", "w")

lilength = len(LoL)
liwidelength = len(LoL[1])

print "length of list is " , lilength, "long"
print "length of list is " , liwidelength, "long"

for x in range(lilength):
outfile.write(" ( ")
outfile.write('Insert NBUSER.Grochamber Values ')
for y in range(liwidelength):
outfile.write( "'%s'," % (LoL[x][y]))
outfile.write(" ) \n")

outfile.close()

I have 3 questions :

1. The formatting in the first line comes out wrong all the time. I m
using windows python 2.5.1. The last part of the first line is always
on the second line.

2. How do I avoid the "," symbol after the last entry in the line?
(this are supposed to be sql-queries - importing excel based tabbed
data to sql database)

3. What do I do when the data is missing? Like missing data?

Thanks for all your help!

Mike

Sergio Correia · Nov 21, 2007

Hey Mike,
Welcome to Python!

About your first issue, just change the line
outfile.write( "'%s'," % (LoL[x][y]))
With
outfile.write( "'%s'," % (LoL[x][y][:-1]))

Why? Because when you do the line.split, you are including the '\n' at
the end, so a new line is created.

Now, what you are doing is not very pythonic (batteries are included
in python, so you could just use the CSV module). Also, the for x in
range(len(somelist)) is not recommended, you can just do something
like:

========================
import csv

infile = open("mydata.txt", "rb")
outfile = open("out.txt", "wb")

reader = csv.reader(infile, delimiter='\t')
writer = csv.writer(outfile, quotechar=None, delimiter = "\\")

for row in reader:
data = "'" + "', '".join(row) + "'"
base = " ( Insert NBUSER.Grochamber Values %s, )"
writer.writerow([base % data])

infile.close()
outfile.close()
========================
The above lines works like your program, writing exactly what you asked.
Again, all lists are iterable, you don't need to iterate an integer
from 1 to len(list). (isn't python wonderful?)

HTH,
Sergio

Dennis Lee Bieber · Nov 21, 2007

Hi all,

My input file looks like this : ( the data is separated by tabs )

11/26/2007 56.366 898.90 -10.086 23.11 1212.3
11/26/2007 52.25 897.6 -12.5 12.6 13333.5
11/26/2007 52.25 897.6 -12.5 12.6 133.5

The output I'm trying to get is as follows :

( Insert NBUSER.Grochamber Values
'11/26/2007','56.366','898.90','-10.086','23.11','1212.3', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','13333.5', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','133.5', )

said:
2. How do I avoid the "," symbol after the last entry in the line?
(this are supposed to be sql-queries - importing excel based tabbed
data to sql database)

If those are SQL inserts, the ( is in the wrong place...

insert into NBUSER.Grochamber values (v1, v2, ... , vx)

3. What do I do when the data is missing? Like missing data?

First, for reading the file, recommend you look at the CSV module,
which can be configured to use TABS rather than COMMAS.

For SQL -- if you are going to be writing raw text strings to an
output file for later batching, YOU are going to have to supply some
means to properly escape the data. The better way is to have the program
connect to the database, using the applicable database adapter: MySQLdb
for MySQL, pysqlite2 (or some variant) for SQLite3, some generic ODBC
adapter if going that route... Let IT do the escaping.

Now, since MySQLdb just happens to expose the escaping function, AND
just uses %s formatting of the results, one could easily get stuff to
write to a file.

import MySQLdb
con = MySQLdb.connect(host="localhost", user="test", passwd="test", db="test")
data = [ "11/26/2007 56.366 898.90 -10.086 23.11 1212.3",

Click to expand...

Click to expand...

.... "11/26/2007 897.6 O'Reilly 12.6 13333.5",
.... "11/26/2007 52.25 897.6 -12.5 12.6 133.5" ]

Note how I left out a field (two tabs, nothing between), and how I
put in a data item with a ' in it.
.... flds = ln.split("\t")
.... placeholders = ", ".join(["%s"] * len(flds))
.... sql = BASE % placeholders
.... sql = sql % con.literal(flds)
.... print sql
....
insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90',
'-10.086', '23.11', '1212.3')
insert into NBUSER.Grochamber values ('11/26/2007', '', '897.6',
'O\'Reilly', '12.6', '13333.5')
insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6',
'-12.5', '12.6', '133.5')
Note how the empty field is just '' (If you really need a NULL,
you'll have to do some games to put a Python None entity into that empty
string field). Also note how the single quote string value has been
escaped.

Something like this for NULL in STRING DATA -- if a field were
numeric 0 it would get substituted with a NULL too...
.... flds = ln.split("\t")
.... placeholders = ", ".join(["%s"] * len(flds))
.... sql = BASE % placeholders
.... flds = [(fld or None) for fld in flds]
.... sql = sql % con.literal(flds)
.... print sql
....
insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90',
'-10.086', '23.11', '1212.3')
insert into NBUSER.Grochamber values ('11/26/2007', NULL, '897.6',
'O\'Reilly', '12.6', '13333.5')
insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6',
'-12.5', '12.6', '133.5')
--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

mike5160 · Nov 21, 2007

Hey Mike,
Welcome to Python!

About your first issue, just change the line
outfile.write( "'%s'," % (LoL[x][y]))
With
outfile.write( "'%s'," % (LoL[x][y][:-1]))

Why? Because when you do the line.split, you are including the '\n' at
the end, so a new line is created.

Now, what you are doing is not very pythonic (batteries are included
in python, so you could just use the CSV module). Also, the for x in
range(len(somelist)) is not recommended, you can just do something
like:

========================
import csv

infile = open("mydata.txt", "rb")
outfile = open("out.txt", "wb")

reader = csv.reader(infile, delimiter='\t')
writer = csv.writer(outfile, quotechar=None, delimiter = "\\")

for row in reader:
data = "'" + "', '".join(row) + "'"
base = " ( Insert NBUSER.Grochamber Values %s, )"
writer.writerow([base % data])

infile.close()
outfile.close()
========================
The above lines works like your program, writing exactly what you asked.
Again, all lists are iterable, you don't need to iterate an integer
from 1 to len(list). (isn't python wonderful?)

HTH,
Sergio

Hi all,

Click to expand...

My input file looks like this : ( the data is separated by tabs )

Click to expand...

11/26/2007 56.366 898.90 -10.086 23.11 1212.3
11/26/2007 52.25 897.6 -12.5 12.6 13333.5
11/26/2007 52.25 897.6 -12.5 12.6 133.5

Click to expand...

The output I'm trying to get is as follows :

Click to expand...

( Insert NBUSER.Grochamber Values
'11/26/2007','56.366','898.90','-10.086','23.11','1212.3', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','13333.5', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','133.5', )

Click to expand...

The following is the program i have written so far :

Click to expand...

LoL = []

Click to expand...

for line in open('mydata.txt'):
LoL.append(line.split("\t"))

Click to expand...

print "read from a file: ", LoL,

Click to expand...

outfile = open("out.dat", "w")

Click to expand...

lilength = len(LoL)
liwidelength = len(LoL[1])

Click to expand...

print "length of list is " , lilength, "long"
print "length of list is " , liwidelength, "long"

Click to expand...

for x in range(lilength):
outfile.write(" ( ")
outfile.write('Insert NBUSER.Grochamber Values ')
for y in range(liwidelength):
outfile.write( "'%s'," % (LoL[x][y]))
outfile.write(" ) \n")

outfile.close()

Click to expand...

I have 3 questions :

Click to expand...

1. The formatting in the first line comes out wrong all the time. I m
using windows python 2.5.1. The last part of the first line is always
on the second line.

Click to expand...

2. How do I avoid the "," symbol after the last entry in the line?
(this are supposed to be sql-queries - importing excel based tabbed
data to sql database)

Click to expand...

3. What do I do when the data is missing? Like missing data?

Click to expand...

Thanks for all your help!

Click to expand...

HI Sergio,

First of all, thanks for your reply and yes I'm new to Python.
Did a google on CSV and I am reading the documentation about it right
now. In the post I mentioned I was using Windows. I also have a laptop
with linux installed on it. When I ran the same program on my linux
laptop I did see the \n included in the list. Somehow, I did not see
it on windows, or missed it. So that cleared up the first problem.
Also, I will be doing a lot of this data importing from excel etc. can
you point me to a tutorial/document/book etc. where I can find
snippets of using various python utilities. For eg. something which
has the sample for using "line.split("\t") " or "
outfile.write( "'%s'," % (LoL[x][y][:-1])) " , explaining the various
options available. The default "Idle gui help" is not too informative
to a newbie like me.

Thanks again for your reply,
Mike.

mike5160 · Nov 21, 2007

Hi all,

Click to expand...

My input file looks like this : ( the data is separated by tabs )

Click to expand...

11/26/2007 56.366 898.90 -10.086 23.11 1212.3
11/26/2007 52.25 897.6 -12.5 12.6 13333.5
11/26/2007 52.25 897.6 -12.5 12.6 133.5

Click to expand...

The output I'm trying to get is as follows :

Click to expand...

( Insert NBUSER.Grochamber Values
'11/26/2007','56.366','898.90','-10.086','23.11','1212.3', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','13333.5', )
( Insert NBUSER.Grochamber Values
'11/26/2007','52.25','897.6','-12.5','12.6','133.5', )

Click to expand...

2. How do I avoid the "," symbol after the last entry in the line?
(this are supposed to be sql-queries - importing excel based tabbed
data to sql database)

Click to expand...

If those are SQL inserts, the ( is in the wrong place...

insert into NBUSER.Grochamber values (v1, v2, ... , vx)

3. What do I do when the data is missing? Like missing data?

Click to expand...

First, for reading the file, recommend you look at the CSV module,
which can be configured to use TABS rather than COMMAS.

For SQL -- if you are going to be writing raw text strings to an
output file for later batching, YOU are going to have to supply some
means to properly escape the data. The better way is to have the program
connect to the database, using the applicable database adapter: MySQLdb
for MySQL, pysqlite2 (or some variant) for SQLite3, some generic ODBC
adapter if going that route... Let IT do the escaping.

Now, since MySQLdb just happens to expose the escaping function, AND
just uses %s formatting of the results, one could easily get stuff to
write to a file.

import MySQLdb
con = MySQLdb.connect(host="localhost", user="test", passwd="test", db="test")
data = [ "11/26/2007 56.366 898.90 -10.086 23.11 1212.3",

Click to expand...

Click to expand...

... "11/26/2007 897.6 O'Reilly 12.6 13333.5",
... "11/26/2007 52.25 897.6 -12.5 12.6 133.5" ]

Note how I left out a field (two tabs, nothing between), and how I
put in a data item with a ' in it.

... flds = ln.split("\t")
... placeholders = ", ".join(["%s"] * len(flds))
... sql = BASE % placeholders
... sql = sql % con.literal(flds)
... print sql
...
insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90',
'-10.086', '23.11', '1212.3')
insert into NBUSER.Grochamber values ('11/26/2007', '', '897.6',
'O\'Reilly', '12.6', '13333.5')
insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6',
'-12.5', '12.6', '133.5')

Note how the empty field is just '' (If you really need a NULL,
you'll have to do some games to put a Python None entity into that empty
string field). Also note how the single quote string value has been
escaped.

Something like this for NULL in STRING DATA -- if a field were
numeric 0 it would get substituted with a NULL too...

... flds = ln.split("\t")
... placeholders = ", ".join(["%s"] * len(flds))
... sql = BASE % placeholders
... flds = [(fld or None) for fld in flds]
... sql = sql % con.literal(flds)
... print sql
...
insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90',
'-10.086', '23.11', '1212.3')
insert into NBUSER.Grochamber values ('11/26/2007', NULL, '897.6',
'O\'Reilly', '12.6', '13333.5')
insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6',
'-12.5', '12.6', '133.5')

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/98

Hi Dennis,

Thanks to you for your reply. I am a newbie to Python and appreciate
you helping me. Now, I am importing data from an excel sheet and
getting it ready for a derby database. I am to use netbeans, since our
research team uses that. However, derby database uses sql entries to
update the database. And I m trying to format all the excel data I
have, which I got from using labview. I suggested that we use awk/perl/
python etc. and finally after looking at the documentation available I
figured Python would be best. However, (see my reply above) I am
looking for a sample book/document etc. somebody suggested we try
Python Phrasebook. But that one covers a lot of different fields
whereas for my purposes I need a book with examples on using Python in
the above manner. If you or anybody knows about this kind of book
please let me know.

Thank you very much for your help,
Mike.

mike5160 · Nov 21, 2007

If those are SQL inserts, the ( is in the wrong place...

Click to expand...

insert into NBUSER.Grochamber values (v1, v2, ... , vx)

Click to expand...

First, for reading the file, recommend you look at the CSV module,
which can be configured to use TABS rather than COMMAS.

Click to expand...

For SQL -- if you are going to be writing raw text strings to an
output file for later batching, YOU are going to have to supply some
means to properly escape the data. The better way is to have the program
connect to the database, using the applicable database adapter: MySQLdb
for MySQL, pysqlite2 (or some variant) for SQLite3, some generic ODBC
adapter if going that route... Let IT do the escaping.

Click to expand...

Now, since MySQLdb just happens to expose the escaping function, AND
just uses %s formatting of the results, one could easily get stuff to
write to a file.

import MySQLdb
con = MySQLdb.connect(host="localhost", user="test", passwd="test", db="test")
data = [ "11/26/2007 56.366 898.90 -10.086 23.11 1212.3",

Click to expand...

Click to expand...

... "11/26/2007 897.6 O'Reilly 12.6 13333.5",
... "11/26/2007 52.25 897.6 -12.5 12.6 133.5" ]

Click to expand...

Note how I left out a field (two tabs, nothing between), and how I
put in a data item with a ' in it.

Click to expand...

... flds = ln.split("\t")
... placeholders = ", ".join(["%s"] * len(flds))
... sql = BASE % placeholders
... sql = sql % con.literal(flds)
... print sql
...
insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90',
'-10.086', '23.11', '1212.3')
insert into NBUSER.Grochamber values ('11/26/2007', '', '897.6',
'O\'Reilly', '12.6', '13333.5')
insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6',
'-12.5', '12.6', '133.5')

Click to expand...

Note how the empty field is just '' (If you really need a NULL,
you'll have to do some games to put a Python None entity into that empty
string field). Also note how the single quote string value has been
escaped.

Click to expand...

Something like this for NULL in STRING DATA -- if a field were
numeric 0 it would get substituted with a NULL too...

Click to expand...

... flds = ln.split("\t")
... placeholders = ", ".join(["%s"] * len(flds))
... sql = BASE % placeholders
... flds = [(fld or None) for fld in flds]
... sql = sql % con.literal(flds)
... print sql
...
insert into NBUSER.Grochamber values ('11/26/2007', '56.366', '898.90',
'-10.086', '23.11', '1212.3')
insert into NBUSER.Grochamber values ('11/26/2007', NULL, '897.6',
'O\'Reilly', '12.6', '13333.5')
insert into NBUSER.Grochamber values ('11/26/2007', '52.25', '897.6',
'-12.5', '12.6', '133.5')

Click to expand...

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/98143

Click to expand...

Hi Dennis,

Thanks to you for your reply. I am a newbie to Python and appreciate
you helping me. Now, I am importing data from an excel sheet and
getting it ready for a derby database. I am to use netbeans, since our
research team uses that. However, derby database uses sql entries to
update the database. And I m trying to format all the excel data I
have, which I got from using labview. I suggested that we use awk/perl/
python etc. and finally after looking at the documentation available I
figured Python would be best. However, (see my reply above) I am
looking for a sample book/document etc. somebody suggested we try
Python Phrasebook. But that one covers a lot of different fields
whereas for my purposes I need a book with examples on using Python in
the above manner. If you or anybody knows about this kind of book
please let me know.

Thank you very much for your help,
Mike.

Oops! Sorry I did not know what I did , but I just noticed that I
changed the subject of the Discussion twice. I just want every body to
know that it was unintentional.

Thanks,
Mike.

Dennis Lee Bieber · Nov 22, 2007

Thanks to you for your reply. I am a newbie to Python and appreciate
you helping me. Now, I am importing data from an excel sheet and
getting it ready for a derby database. I am to use netbeans, since our
research team uses that. However, derby database uses sql entries to
update the database. And I m trying to format all the excel data I
have, which I got from using labview. I suggested that we use awk/perl/
python etc. and finally after looking at the documentation available I
figured Python would be best. However, (see my reply above) I am
looking for a sample book/document etc. somebody suggested we try
Python Phrasebook. But that one covers a lot of different fields
whereas for my purposes I need a book with examples on using Python in
the above manner. If you or anybody knows about this kind of book
please let me know.

Unfortunately, you probably won't find any single book...

Parsing fixed format (if still variable line length) text files is
simplified by Python's string.split() and slicing, but those are just
built-in functions for simple algorithms that are language independent.
You might find it under the term "tokenizing"

Formatting SQL statements is... SQL... a totally separate language,
hypothetically standardized but having lots of DBMS specific dialects.

Also, you appear to be looking at it from the direction of
translating tab separated output file from Excel into a sequence of SQL
insert statements which will be written to another file, then "batched"
into some DBMS command line interpreter. That means that you will have
to be responsible for knowing how to escape special characters, properly
indicating nulls, etc.

Presuming http://db.apache.org/derby/ is the DBMS you mention, I
wonder if you would not be better off converting the Excel data into an
XML file of the type wanted by
http://db.apache.org/derby/integrate/db_ddlutils.html

Otherwise, I'm afraid to say, I'd suggest coding the Excel parser
/in/ Java, and use JDBC to directly insert the data... (If there were an
ODBC compatible driver, I'd suggest using a Python ODBC adapter and
doing it from Python).

If using the "ij" utility from a command line, please note that it
supports multiple record insert; instead of

insert into <table> values (a, ..., z);
insert into <table> values (a2, ..., z2);
....
insert into <table> values (aX, ..., zX);

you can use

insert into <table> values
(a, ..., z),
(a2, ..., z2),
....
(aX, ..., zX);

though there may be a limit to how long the statement can be -- maybe
run in batches of 25-50 records at a time...

Thank you very much for your help,
Mike.

--
Wulfraed Dennis Lee Bieber KD6MOG
(e-mail address removed) (e-mail address removed)
HTTP://wlfraed.home.netcom.com/
(Bestiaria Support Staff: (e-mail address removed))
HTTP://www.bestiaria.com/

M.-A. Lemburg · Nov 22, 2007

Dennis said:
Unfortunately, you probably won't find any single book...

Parsing fixed format (if still variable line length) text files is
simplified by Python's string.split() and slicing, but those are just
built-in functions for simple algorithms that are language independent.
You might find it under the term "tokenizing"

Formatting SQL statements is... SQL... a totally separate language,
hypothetically standardized but having lots of DBMS specific dialects.

Also, you appear to be looking at it from the direction of
translating tab separated output file from Excel into a sequence of SQL
insert statements which will be written to another file, then "batched"
into some DBMS command line interpreter. That means that you will have
to be responsible for knowing how to escape special characters, properly
indicating nulls, etc.

Presuming http://db.apache.org/derby/ is the DBMS you mention, I
wonder if you would not be better off converting the Excel data into an
XML file of the type wanted by
http://db.apache.org/derby/integrate/db_ddlutils.html

Otherwise, I'm afraid to say, I'd suggest coding the Excel parser
/in/ Java, and use JDBC to directly insert the data... (If there were an
ODBC compatible driver, I'd suggest using a Python ODBC adapter and
doing it from Python).

FYI: There is an Excel ODBC driver for Windows which is included in
the Microsoft MDAC package. Using it, you can query Excel tables
with SQL. mxODBC works great with it. OTOH, if you're on Windows
anyway, you can also use the win32 Python package and then tap
directly into Excel using COM.

If using the "ij" utility from a command line, please note that it
supports multiple record insert; instead of

insert into <table> values (a, ..., z);
insert into <table> values (a2, ..., z2);
...
insert into <table> values (aX, ..., zX);

you can use

insert into <table> values
(a, ..., z),
(a2, ..., z2),
...
(aX, ..., zX);

though there may be a limit to how long the statement can be -- maybe
run in batches of 25-50 records at a time...

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Nov 22 2007)________________________________________________________________________

:::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! ::::

eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611

Register Question	0	Oct 21, 2024
TypeError: not all arguments converted during string formatting	2	Dec 13, 2013
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Minimum Total Difficulty	0	Nov 15, 2023
Taskcproblem calendar	4	Aug 31, 2023
Connected SQLite to my java program but information are not submitted	2	Aug 2, 2022
Tic Tac Toe Game	2	Mar 10, 2024
Range / empty list issues??	1	Dec 11, 2023

Formatting question.

mike5160

Sergio Correia

Dennis Lee Bieber

mike5160

mike5160

mike5160

Dennis Lee Bieber

M.-A. Lemburg

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads