Parsing by Line Data

python1 · Jun 17, 2004

Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
..
..
..

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is found.

How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?

Eddie Corns · Jun 17, 2004

python1 said:
Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is found.

How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?

def gen_records(src):
rec = []
for line in src:
if line.startswith('01'):
if rec: yield rec
rec = [line]
else:
rec.append(line)
if rec:yield rec

inf = file('input-file')
for record in gen_records (inf):
do_something_to_list (record)

Eddie

Bill Dandreta · Jun 17, 2004

python1 said:
...lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is
found.

How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?

First let me prface my remarks by saying I am not much of a programmer
so this may not be the best way to solve this but I would use a
dictionary someting like this (untested):

myinput = open(myfile,'r')
lines = myinput.readlines()
myinput.close()

mydict = {}
index = -1

for l in lines:
if l[0:2] == '01'
counter = 0
index += 1
mydict[(index,counter)] = l[2:]
else:
mydict[(index,counter)] = l[2:]
counter += 1

You can easy extract the data with a nested loop.

Bill

python1 · Jun 17, 2004

Eddie said:
Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data into
one list, and breaking into another list when the next '01' record is found.

How would you do this? I'm used to using 'readlines()' to pull the file
data line by line, but in this case, determining the break-point will
need to be done by reading the '01' from the line ahead. Would you need
to read the whole file into a string and use a regex to break where a
'\n01' is found?

Click to expand...

def gen_records(src):
rec = []
for line in src:
if line.startswith('01'):
if rec: yield rec
rec = [line]
else:
rec.append(line)
if rec:yield rec

inf = file('input-file')
for record in gen_records (inf):
do_something_to_list (record)

Eddie

Thanks Eddie. Very creative. Knew I'd use the 'yield' keyword someday

python1 · Jun 17, 2004

Bill said:
python1 said:

...lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail
lines to a header.

I'm looking for a way to put the '01' and subsequent '02' line data
into one list, and breaking into another list when the next '01'
record is found.

How would you do this? I'm used to using 'readlines()' to pull the
file data line by line, but in this case, determining the break-point
will need to be done by reading the '01' from the line ahead. Would
you need to read the whole file into a string and use a regex to break
where a '\n01' is found?

Click to expand...

First let me prface my remarks by saying I am not much of a programmer
so this may not be the best way to solve this but I would use a
dictionary someting like this (untested):

myinput = open(myfile,'r')
lines = myinput.readlines()
myinput.close()

mydict = {}
index = -1

for l in lines:
if l[0:2] == '01'
counter = 0
index += 1
mydict[(index,counter)] = l[2:]
else:
mydict[(index,counter)] = l[2:]
counter += 1

You can easy extract the data with a nested loop.

Bill

Thanks Bill. Will use this script in place of Eddie's if python is sub
2.2 on our Aix box.

Thanks again.

Mitja · Jun 18, 2004

python1 said:
Having slight trouble conceptualizing a way to write this script. The
problem is that I have a bunch of lines in a file, for example:

01A\n
02B\n
01A\n
02B\n
02C\n
01A\n
02B\n
.
.
.

The lines beginning with '01' are the 'header' records, whereas the
lines beginning with '02' are detail. There can be several detail
lines
to a header.

I'm looking for a way to put the '01' and subsequent '02' line data
into one list, and breaking into another list when the next '01'
record is found.

I'd probably do something like
records = ('\n'+open('foo.data').read).split('\n01')

You can later do
structured=[record.split('\n') for record in records]
to get a list of lists. '01' is stripped from structured[0] and there may be
other flaws, but I guess the concept is clear.

Command Line Arguments	0	Mar 7, 2023
Dynamic block parsing + scrolling	0	May 30, 2024
Dynamic block parsing + scrolling	0	May 30, 2024
next line (data parsing)	5	Jan 17, 2008
I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
SQL Problem Using Extract Command	0	Apr 8, 2022
HOWTO: Parsing email using Python part2	1	Jul 15, 2011
Can D simulated by H terminate normally?	4	Jun 12, 2023

Parsing by Line Data

python1

Eddie Corns

Bill Dandreta

python1

python1

Mitja

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads