multi regexp analyzer ? or how to do...

J

joh12005

Hello,

here is a trouble that i had, i would like to resolve it with python,
even if i still have no clue on how to do it.

i had many small "text" files, so to speed up processes on them, i used
to copy them inside a huge one adding some king of xml separator :

<file name="...">
[content]
</file>

content is tab separated data (columns) ; data are strings

now here come the tricky part for me :

i would like to be able to create some kind of matching rules, using
regular expressions, rules should match data on one line (the smallest
data unit for me) or a set of lines, say for example :

if on this line , match first column against this regexp and match
second column
and on following line match third column
-> trigger something

so, here is how i had tried :

- having all the rules,
- build some kind of analyzer for each rule,
- keep size of longest one L,
- then read each line of the huge file one by one,
- inside a "file", create all the subsets of length <= L
- for each analyzer see if it matches any of the subsets
- if it occurs...

my trouble is here :

"for each analyzer see if it matches any of the subset"

it is really to slow, i had many many rules, and as it is "for loop
inside for loop", and inside each rule also "for loop on subsets lines"
i need to speed up that, have you any idea ?

i am thinking of having "only rules for one line" and to keep traces of
if a rule is a "ending one" (to trigger something) , or a "must
continue" , but is still unclear to me for now...

a great thing could also have been some sort of dict with regexp
keys...

(and actually it would be great if i could also use some kind of regexp
operator to tell one can skip the content of 0 to n lines before
matching, just as if in the example i had changed "following..." by
"skip at least 2 lines and match third column on next line - it would
be great, but i still have really no idea on how to even think about
that)

great thx to anybody who could help,

best
 
P

Paul McGuire

I'd propose a pyparsing implementation, but you don't give us many
specifics. Is there any chance you could post some sample data, and
one or two of the regexps you are using for matching?

-- Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top