Parsing by Line Data

P

python1

Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
..
..
..

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is found.

How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?
 
E

Eddie Corns

python1 said:
Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.
I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is found.
How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?

def gen_records(src):
rec = []
for line in src:
if line.startswith('01'):
if rec: yield rec
rec = [line]
else:
rec.append(line)
if rec:yield rec

inf = file('input-file')
for record in gen_records (inf):
do_something_to_list (record)

Eddie
 
B

Bill Dandreta

python1 said:
...lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is
found.

How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?

First let me prface my remarks by saying I am not much of a programmer
so this may not be the best way to solve this but I would use a
dictionary someting like this (untested):

myinput = open(myfile,'r')
lines = myinput.readlines()
myinput.close()

mydict = {}
index = -1

for l in lines:
if l[0:2] == '01'
counter = 0
index += 1
mydict[(index,counter)] = l[2:]
else:
mydict[(index,counter)] = l[2:]
counter += 1

You can easy extract the data with a nested loop.

Bill
 
P

python1

Eddie said:
Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:



The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is found.

How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?


def gen_records(src):
rec = []
for line in src:
if line.startswith('01'):
if rec: yield rec
rec = [line]
else:
rec.append(line)
if rec:yield rec

inf = file('input-file')
for record in gen_records (inf):
do_something_to_list (record)

Eddie

Thanks Eddie. Very creative. Knew I'd use the 'yield' keyword someday :)
 
P

python1

Bill said:
python1 said:
...lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail
lines to a header.

I'm looking for a way to put the '01' and subsequent '02' line data
into one list, and breaking into another list when the next '01'
record is found.

How would you do this? I'm used to using 'readlines()' to pull the
file data line by line, but in this case, determining the break-point
will need to be done by reading the '01' from the line ahead. Would
you need to read the whole file into a string and use a regex to break
where a '\n01' is found?


First let me prface my remarks by saying I am not much of a programmer
so this may not be the best way to solve this but I would use a
dictionary someting like this (untested):

myinput = open(myfile,'r')
lines = myinput.readlines()
myinput.close()

mydict = {}
index = -1

for l in lines:
if l[0:2] == '01'
counter = 0
index += 1
mydict[(index,counter)] = l[2:]
else:
mydict[(index,counter)] = l[2:]
counter += 1

You can easy extract the data with a nested loop.

Bill

Thanks Bill. Will use this script in place of Eddie's if python is sub
2.2 on our Aix box.

Thanks again.
 
M

Mitja

python1 said:
Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail
lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data
into one list, and breaking into another list when the next '01'
record is found.

I'd probably do something like
records = ('\n'+open('foo.data').read).split('\n01')

You can later do
structured=[record.split('\n') for record in records]
to get a list of lists. '01' is stripped from structured[0] and there may be
other flaws, but I guess the concept is clear.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,201
Messages
2,571,051
Members
47,656
Latest member
rickwatson

Latest Threads

Top