multi regexp analyzer ? or how to do...

joh12005 · Jun 30, 2005

Hello,

here is a trouble that i had, i would like to resolve it with python,
even if i still have no clue on how to do it.

i had many small "text" files, so to speed up processes on them, i used
to copy them inside a huge one adding some king of xml separator :

<file name="...">
[content]
</file>

content is tab separated data (columns) ; data are strings

now here come the tricky part for me :

i would like to be able to create some kind of matching rules, using
regular expressions, rules should match data on one line (the smallest
data unit for me) or a set of lines, say for example :

if on this line , match first column against this regexp and match
second column
and on following line match third column
-> trigger something

so, here is how i had tried :

- having all the rules,
- build some kind of analyzer for each rule,
- keep size of longest one L,
- then read each line of the huge file one by one,
- inside a "file", create all the subsets of length <= L
- for each analyzer see if it matches any of the subsets
- if it occurs...

my trouble is here :

"for each analyzer see if it matches any of the subset"

it is really to slow, i had many many rules, and as it is "for loop
inside for loop", and inside each rule also "for loop on subsets lines"
i need to speed up that, have you any idea ?

i am thinking of having "only rules for one line" and to keep traces of
if a rule is a "ending one" (to trigger something) , or a "must
continue" , but is still unclear to me for now...

a great thing could also have been some sort of dict with regexp
keys...

(and actually it would be great if i could also use some kind of regexp
operator to tell one can skip the content of 0 to n lines before
matching, just as if in the example i had changed "following..." by
"skip at least 2 lines and match third column on next line - it would
be great, but i still have really no idea on how to even think about
that)

great thx to anybody who could help,

best

Paul McGuire · Jun 30, 2005

I'd propose a pyparsing implementation, but you don't give us many
specifics. Is there any chance you could post some sample data, and
one or two of the regexps you are using for matching?

-- Paul

How do I fix this issue in sqaurespace code block?	1	Jul 2, 2024
RegExp - Match specific words, but not if they're inside parenthesis (with or without other words within)	6	Jan 29, 2023
Multivendor marketplace for music sheets. Do I need to learn how to code ? or shopify ?	0	Jan 29, 2022
How do I use Find and Loop in VBA for Excel to identify, delete, and insert blank row for values greater than 6?	0	Feb 28, 2022
How do I solidify my Python skills	1	Sep 15, 2023
[C++] Pointers declared inside a function, how do I manage them?	5	May 3, 2023
small regexp help	1	Oct 30, 2013
How do i get numberOfItemsHired to only accept 1-500 if it is outside those values error message should be displayed	10	Jul 5, 2024

multi regexp analyzer ? or how to do...

joh12005

Paul McGuire

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads