pyparsing question: single word values with a double quoted stringevery once in a while

H

hubritic

I want to parse a log that has entries like this:

[2009-03-17 07:28:05.545476 -0500] rprt s=d2bpr80d6 m=2 mod=mail
cmd=msg module=access rule=x_dynamic_ip action=discard attachments=0
rcpts=1
routes=DL_UK_ALL,NOT_DL_UK_ALL,default_inbound,firewallsafe,mail01_mail02,spfsafe
size=4363 guid=291f0f108fd3a6e73a11f96f4fb9e4cd hdr_mid=
qid=n2HCS4ks025832 subject="I want to interview you" duration=0.236
elapsed=0.280


the keywords will not always be the same. Also differing log levels
will provide a different mix of keywords.

This is good enough to get the majority of cases where there is a
keyword, a "=" and then a value with no spaces:

Group(Word(alphas + "+_-.").setResultsName("keyword") + Suppress
(Literal ("=")) + Optional(Word(printables)))

Sometimes there is a subject, which is a quoted string. That is easy
enough to get with this:
dblQuotedString(ZeroOrMore(Word(printables) ) )

My problem is combining them into one expression. Either I wind up
with just the subject or I wind up with they keywords and their
values, one of which is:

subject, '"I'

which is clearly not what I want.

Do I scan each line twice, first looking for quotes ?

Thanks
 
P

Piet van Oostrum

hubritic said:
h> I want to parse a log that has entries like this:
h> [2009-03-17 07:28:05.545476 -0500] rprt s=d2bpr80d6 m=2 mod=mail
h> cmd=msg module=access rule=x_dynamic_ip action=discard attachments=0
h> rcpts=1
h> routes=DL_UK_ALL,NOT_DL_UK_ALL,default_inbound,firewallsafe,mail01_mail02,spfsafe
h> size=4363 guid=291f0f108fd3a6e73a11f96f4fb9e4cd hdr_mid=
h> qid=n2HCS4ks025832 subject="I want to interview you" duration=0.236
h> elapsed=0.280

h> the keywords will not always be the same. Also differing log levels
h> will provide a different mix of keywords.
h> This is good enough to get the majority of cases where there is a
h> keyword, a "=" and then a value with no spaces:
h> Group(Word(alphas + "+_-.").setResultsName("keyword") + Suppress
h> (Literal ("=")) + Optional(Word(printables)))
h> Sometimes there is a subject, which is a quoted string. That is easy
h> enough to get with this:
h> dblQuotedString(ZeroOrMore(Word(printables) ) )
h> My problem is combining them into one expression. Either I wind up
h> with just the subject or I wind up with they keywords and their
h> values, one of which is:
h> subject, '"I'
h> which is clearly not what I want.
h> Do I scan each line twice, first looking for quotes ?


Use the MatchFirst (|)

I have also split it up to make it more readable

kw = Word(alphas + "+_-.").setResultsName("keyword")
eq = Suppress(Literal ("="))
value = dblQuotedString | Optional(Word(printables))

pattern = Group(kw + eq + value)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
473,992
Messages
2,570,220
Members
46,805
Latest member
ClydeHeld1

Latest Threads

Top