next line (data parsing)

robleachza · Jan 17, 2008

Hi there,
I'm struggling to find a sensible way to process a large chuck of
data--line by line, but also having the ability to move to subsequent
'next' lines within a for loop. I was hoping someone would be willing
to share some insights to help point me in the right direction. This
is not a file, so any file modules or methods available for files
parsing wouldn't apply.

I run a command on a remote host by using the pexpect (pxssh) module.
I get the result back which are pages and pages of pre-formatted text.
This is a pared down example (some will notice it's tivoli schedule
output).

....
Job Name Run Time
Pri Start Time Dependencies
Schedule HOST #ALL_LETTERS ( ) 00:01
10 22:00(01/16/08) LTR_CLEANUP

(SITE1 LTR_DB_LETTER 00:01
10
Total 00:01

Schedule HOST #DAILY ( ) 00:44 10
18:00(01/16/08) DAILY_LTR

(SITE3 RUN_LTR14_PROC 00:20
10
(SITE1 LTR14A_WRAPPER 00:06
10 SITE3#RUN_LTR14_PROC
(SITE1 LTR14B_WRAPPER 00:04
10 SITE1#LTR14A_WRAPPER
(SITE1 LTR14C_WRAPPER 00:03
10 SITE1#LTR14B_WRAPPER
(SITE1 LTR14D_WRAPPER 00:02
10 SITE1#LTR14C_WRAPPER
(SITE1 LTR14E_WRAPPER 00:01
10 SITE1#LTR14D_WRAPPER
(SITE1 LTR14F_WRAPPER 00:03
10 SITE1#LTR14E_WRAPPER
(SITE1 LTR14G_WRAPPER 00:03
10 SITE1#LTR14F_WRAPPER
(SITE1 LTR14H_WRAPPER 00:02
10 SITE1#LTR14G_WRAPPER
Total 00:44

Schedule HOST #CARDS ( ) 00:02 10
20:30(01/16/08) STR2_D

(SITE7 DAILY_MEETING_FILE 00:01
10
(SITE3 BEHAVE_HALT_FILE 00:01
10 SITE7#DAILY_HOME_FILE
Total 00:02
....

I can iterate over each line by setting a for loop on the data object;
no problem. But basically my intension is to locate the line "Schedule
HOST" and progressively move on to the 'next' line, parsing out the
pieces I care about, until I then hit "Total", then I resume to the
start of the for loop which locates the next "Schedule HOST".

I realize this is a really basic problem, but I can't seem to
articulate my intension well enough to find documentation or examples
that have been helpful to me. I bought the Python cookbook yesterday
which has gotten me a lot further in some areas, but still hasn't
given me what I'm looking for. This is just a pet project to help me
reduce some of the tedious aspects of my daily tasks, so I've been
using this as means to discover Python. I appreciate any insights that
would help set me in the right direction.

Cheers,
-Rob

Paul McGuire · Jan 17, 2008

Hi there,
I'm struggling to find a sensible way to process a large chuck of
data--line by line, but also having the ability to move to subsequent
'next' lines within a for loop. I was hoping someone would be willing
to share some insights to help point me in the right direction. This
is not a file, so any file modules or methods available for files
parsing wouldn't apply.

I run a command on a remote host by using the pexpect (pxssh) module.
I get the result back which are pages and pages of pre-formatted text.
This is a pared down example (some will notice it's tivoli schedule
output).

Pyparsing will work on a string or a file, and will do the line-by-
line iteration for you. You just have to define the expected format
of the data. The sample code below parses the data that you posted.
From this example, you can refine the code by assigning names to the
different parsed fields, and use the field names to access the parsed
values.

More info about pyparsing at http://pyparsing.wikispaces.com.

-- Paul

from pyparsing import *

integer = Word(nums)
timestamp = Combine(Word(nums,exact=2)+":"+Word(nums,exact=2))
dateString = Combine(Word(nums,exact=2)+"/"+
Word(nums,exact=2)+"/"+
Word(nums,exact=2))

schedHeader = Literal("Schedule HOST") + Word("#",alphas+"_") + "(" +
")" + \
timestamp + integer + timestamp+"("+dateString+")" + \
Optional(~LineEnd() + empty + restOfLine)
schedLine = Group(Word("(",alphanums) + Word(alphanums+"_") +
timestamp +
integer + Optional(~LineEnd() + empty + restOfLine)
) + LineEnd().suppress()
schedTotal = Literal("Total") + timestamp

sched = schedHeader + Group(OneOrMore(schedLine)) + schedTotal

from pprint import pprint
for s in sched.searchString(data):
pprint( s.asList() )
print

Prints:

['Schedule HOST',
'#ALL_LETTERS',
'(',
')',
'00:01',
'10',
'22:00',
'(',
'01/16/08',
')',
'LTR_CLEANUP ',
[['(SITE1', 'LTR_DB_LETTER', '00:01', '10']],
'Total',
'00:01']

['Schedule HOST',
'#DAILY',
'(',
')',
'00:44',
'10',
'18:00',
'(',
'01/16/08',
')',
'DAILY_LTR ',
[['(SITE3', 'RUN_LTR14_PROC', '00:20', '10'],
['(SITE1', 'LTR14A_WRAPPER', '00:06', '10', 'SITE3#RUN_LTR14_PROC
'],
['(SITE1', 'LTR14B_WRAPPER', '00:04', '10', 'SITE1#LTR14A_WRAPPER
'],
['(SITE1', 'LTR14C_WRAPPER', '00:03', '10', 'SITE1#LTR14B_WRAPPER
'],
['(SITE1', 'LTR14D_WRAPPER', '00:02', '10', 'SITE1#LTR14C_WRAPPER
'],
['(SITE1', 'LTR14E_WRAPPER', '00:01', '10', 'SITE1#LTR14D_WRAPPER
'],
['(SITE1', 'LTR14F_WRAPPER', '00:03', '10', 'SITE1#LTR14E_WRAPPER
'],
['(SITE1', 'LTR14G_WRAPPER', '00:03', '10', 'SITE1#LTR14F_WRAPPER
'],
['(SITE1', 'LTR14H_WRAPPER', '00:02', '10', 'SITE1#LTR14G_WRAPPER
']],
'Total',
'00:44']

['Schedule HOST',
'#CARDS',
'(',
')',
'00:02',
'10',
'20:30',
'(',
'01/16/08',
')',
'STR2_D ',
[['(SITE7', 'DAILY_MEETING_FILE', '00:01', '10'],
['(SITE3', 'BEHAVE_HALT_FILE', '00:01', '10', 'SITE7#DAILY_HOME_FILE
']],
'Total',
'00:02']

Scott David Daniels · Jan 17, 2008

Hi there,
I'm struggling to find a sensible way to process a large chuck of
data--line by line, but also having the ability to move to subsequent
'next' lines within a for loop. I was hoping someone would be willing
to share some insights to help point me in the right direction. This
is not a file, so any file modules or methods available for files
parsing wouldn't apply.

I can iterate over each line by setting a for loop on the data object;
no problem. But basically my intension is to locate the line "Schedule
HOST" and progressively move on to the 'next' line, parsing out the
pieces I care about, until I then hit "Total", then I resume to the
start of the for loop which locates the next "Schedule HOST".

if you can do:

for line in whatever:
...

then you can do:

source = iter(whatever)
for intro in source:
if intro.startswith('Schedule '):
for line in source:
if line.startswith('Total'):
break
process(intro, line)

--Scott David Daniels
(e-mail address removed)

George Sakkis · Jan 17, 2008

if you can do:

for line in whatever:
...

then you can do:

source = iter(whatever)
for intro in source:
if intro.startswith('Schedule '):
for line in source:
if line.startswith('Total'):
break
process(intro, line)

--Scott David Daniels
(e-mail address removed)

Or if you use this pattern often, you may extract it to a general
grouping function such as http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/521877:

import re

for line in iterblocks(source,
start = lambda line:
line.startswith('Schedule HOST'),
end = lambda line: re.search(r'^
\s*Total',line),
skip_delim=False):
process(line)

George

George Sakkis · Jan 17, 2008

Or if you use this pattern often, you may extract it to a general
grouping function such ashttp://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/521877:

Sorry, google groups fscked up with the auto linewrapping (is there a
way to increase the line length?); here it is again:

import re

for line in iterblocks(source,
start = lambda line: line.startswith('Schedule HOST'),
end = lambda line: re.search(r'^\s*Total',line),
skip_delim = False):
process(line)

George

robleachza · Jan 17, 2008

I'm very appreciative for the comments posted. Thanks to each of you.
All good stuff.
Cheers,
-Rob

outputting time in microseconds or milliseconds	16	Aug 2, 2013
Help to script a very easy program to manipulate timecodes (srt files)	0	Aug 13, 2022
Html data exchange help	0	Jan 2, 2020
How to send an IP packet in Python?	1	Dec 2, 2010
Character set woes with binary data	0	Apr 1, 2007
python socket service related question!	0	Mar 1, 2010
Command Line Arguments	0	Mar 7, 2023
help coding a hash table	2	Feb 7, 2012

next line (data parsing)

robleachza

Paul McGuire

Scott David Daniels

George Sakkis

George Sakkis

robleachza

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads