next line (data parsing)

R

robleachza

Hi there,
I'm struggling to find a sensible way to process a large chuck of
data--line by line, but also having the ability to move to subsequent
'next' lines within a for loop. I was hoping someone would be willing
to share some insights to help point me in the right direction. This
is not a file, so any file modules or methods available for files
parsing wouldn't apply.

I run a command on a remote host by using the pexpect (pxssh) module.
I get the result back which are pages and pages of pre-formatted text.
This is a pared down example (some will notice it's tivoli schedule
output).

....
Job Name Run Time
Pri Start Time Dependencies
Schedule HOST #ALL_LETTERS ( ) 00:01
10 22:00(01/16/08) LTR_CLEANUP

(SITE1 LTR_DB_LETTER 00:01
10
Total 00:01

Schedule HOST #DAILY ( ) 00:44 10
18:00(01/16/08) DAILY_LTR

(SITE3 RUN_LTR14_PROC 00:20
10
(SITE1 LTR14A_WRAPPER 00:06
10 SITE3#RUN_LTR14_PROC
(SITE1 LTR14B_WRAPPER 00:04
10 SITE1#LTR14A_WRAPPER
(SITE1 LTR14C_WRAPPER 00:03
10 SITE1#LTR14B_WRAPPER
(SITE1 LTR14D_WRAPPER 00:02
10 SITE1#LTR14C_WRAPPER
(SITE1 LTR14E_WRAPPER 00:01
10 SITE1#LTR14D_WRAPPER
(SITE1 LTR14F_WRAPPER 00:03
10 SITE1#LTR14E_WRAPPER
(SITE1 LTR14G_WRAPPER 00:03
10 SITE1#LTR14F_WRAPPER
(SITE1 LTR14H_WRAPPER 00:02
10 SITE1#LTR14G_WRAPPER
Total 00:44

Schedule HOST #CARDS ( ) 00:02 10
20:30(01/16/08) STR2_D

(SITE7 DAILY_MEETING_FILE 00:01
10
(SITE3 BEHAVE_HALT_FILE 00:01
10 SITE7#DAILY_HOME_FILE
Total 00:02
....

I can iterate over each line by setting a for loop on the data object;
no problem. But basically my intension is to locate the line "Schedule
HOST" and progressively move on to the 'next' line, parsing out the
pieces I care about, until I then hit "Total", then I resume to the
start of the for loop which locates the next "Schedule HOST".

I realize this is a really basic problem, but I can't seem to
articulate my intension well enough to find documentation or examples
that have been helpful to me. I bought the Python cookbook yesterday
which has gotten me a lot further in some areas, but still hasn't
given me what I'm looking for. This is just a pet project to help me
reduce some of the tedious aspects of my daily tasks, so I've been
using this as means to discover Python. I appreciate any insights that
would help set me in the right direction.

Cheers,
-Rob
 
P

Paul McGuire

Hi there,
I'm struggling to find a sensible way to process a large chuck of
data--line by line, but also having the ability to move to subsequent
'next' lines within a for loop. I was hoping someone would be willing
to share some insights to help point me in the right direction. This
is not a file, so any file modules or methods available for files
parsing wouldn't apply.

I run a command on a remote host by using the pexpect (pxssh) module.
I get the result back which are pages and pages of pre-formatted text.
This is a pared down example (some will notice it's tivoli schedule
output).

Pyparsing will work on a string or a file, and will do the line-by-
line iteration for you. You just have to define the expected format
of the data. The sample code below parses the data that you posted.
From this example, you can refine the code by assigning names to the
different parsed fields, and use the field names to access the parsed
values.

More info about pyparsing at http://pyparsing.wikispaces.com.

-- Paul



from pyparsing import *

integer = Word(nums)
timestamp = Combine(Word(nums,exact=2)+":"+Word(nums,exact=2))
dateString = Combine(Word(nums,exact=2)+"/"+
Word(nums,exact=2)+"/"+
Word(nums,exact=2))

schedHeader = Literal("Schedule HOST") + Word("#",alphas+"_") + "(" +
")" + \
timestamp + integer + timestamp+"("+dateString+")" + \
Optional(~LineEnd() + empty + restOfLine)
schedLine = Group(Word("(",alphanums) + Word(alphanums+"_") +
timestamp +
integer + Optional(~LineEnd() + empty + restOfLine)
) + LineEnd().suppress()
schedTotal = Literal("Total") + timestamp

sched = schedHeader + Group(OneOrMore(schedLine)) + schedTotal

from pprint import pprint
for s in sched.searchString(data):
pprint( s.asList() )
print


Prints:

['Schedule HOST',
'#ALL_LETTERS',
'(',
')',
'00:01',
'10',
'22:00',
'(',
'01/16/08',
')',
'LTR_CLEANUP ',
[['(SITE1', 'LTR_DB_LETTER', '00:01', '10']],
'Total',
'00:01']

['Schedule HOST',
'#DAILY',
'(',
')',
'00:44',
'10',
'18:00',
'(',
'01/16/08',
')',
'DAILY_LTR ',
[['(SITE3', 'RUN_LTR14_PROC', '00:20', '10'],
['(SITE1', 'LTR14A_WRAPPER', '00:06', '10', 'SITE3#RUN_LTR14_PROC
'],
['(SITE1', 'LTR14B_WRAPPER', '00:04', '10', 'SITE1#LTR14A_WRAPPER
'],
['(SITE1', 'LTR14C_WRAPPER', '00:03', '10', 'SITE1#LTR14B_WRAPPER
'],
['(SITE1', 'LTR14D_WRAPPER', '00:02', '10', 'SITE1#LTR14C_WRAPPER
'],
['(SITE1', 'LTR14E_WRAPPER', '00:01', '10', 'SITE1#LTR14D_WRAPPER
'],
['(SITE1', 'LTR14F_WRAPPER', '00:03', '10', 'SITE1#LTR14E_WRAPPER
'],
['(SITE1', 'LTR14G_WRAPPER', '00:03', '10', 'SITE1#LTR14F_WRAPPER
'],
['(SITE1', 'LTR14H_WRAPPER', '00:02', '10', 'SITE1#LTR14G_WRAPPER
']],
'Total',
'00:44']

['Schedule HOST',
'#CARDS',
'(',
')',
'00:02',
'10',
'20:30',
'(',
'01/16/08',
')',
'STR2_D ',
[['(SITE7', 'DAILY_MEETING_FILE', '00:01', '10'],
['(SITE3', 'BEHAVE_HALT_FILE', '00:01', '10', 'SITE7#DAILY_HOME_FILE
']],
'Total',
'00:02']
 
S

Scott David Daniels

Hi there,
I'm struggling to find a sensible way to process a large chuck of
data--line by line, but also having the ability to move to subsequent
'next' lines within a for loop. I was hoping someone would be willing
to share some insights to help point me in the right direction. This
is not a file, so any file modules or methods available for files
parsing wouldn't apply.

I can iterate over each line by setting a for loop on the data object;
no problem. But basically my intension is to locate the line "Schedule
HOST" and progressively move on to the 'next' line, parsing out the
pieces I care about, until I then hit "Total", then I resume to the
start of the for loop which locates the next "Schedule HOST".

if you can do:

for line in whatever:
...

then you can do:

source = iter(whatever)
for intro in source:
if intro.startswith('Schedule '):
for line in source:
if line.startswith('Total'):
break
process(intro, line)

--Scott David Daniels
(e-mail address removed)
 
G

George Sakkis

if you can do:

for line in whatever:
...

then you can do:

source = iter(whatever)
for intro in source:
if intro.startswith('Schedule '):
for line in source:
if line.startswith('Total'):
break
process(intro, line)

--Scott David Daniels
(e-mail address removed)

Or if you use this pattern often, you may extract it to a general
grouping function such as http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/521877:

import re

for line in iterblocks(source,
start = lambda line:
line.startswith('Schedule HOST'),
end = lambda line: re.search(r'^
\s*Total',line),
skip_delim=False):
process(line)


George
 
G

George Sakkis

Or if you use this pattern often, you may extract it to a general
grouping function such ashttp://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/521877:

Sorry, google groups fscked up with the auto linewrapping (is there a
way to increase the line length?); here it is again:

import re

for line in iterblocks(source,
start = lambda line: line.startswith('Schedule HOST'),
end = lambda line: re.search(r'^\s*Total',line),
skip_delim = False):
process(line)


George
 
R

robleachza

I'm very appreciative for the comments posted. Thanks to each of you.
All good stuff.
Cheers,
-Rob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,699
Latest member
AnneRosen

Latest Threads

Top