Large regular expressions

N

Nathan Harmston

Hi,

So I m trying to use a very large regular expression, basically I have
a list of items I want to find in text, its kind of a conjunction of
two regular expressions and a big list......not pretty. However
everytime I try to run my code I get this exception:

OverflowError: regular expression code size limit exceeded

I understand that there is a Python imposed limit on the size of the
regular expression. And although its not nice I have a machine with
12Gb of RAM just waiting to be used, is there anyway I can alter
Python to allow big regular expressions?

Could anyone suggest other methods of these kind of string matching in
Python? I m trying to see if my swigged alphabet trie is faster than
whats possible in Python!

Many thanks,


Nathan
 
A

Alain Ketterlin

[...]
Could anyone suggest other methods of these kind of string matching in
Python? I m trying to see if my swigged alphabet trie is faster than
whats possible in Python!

Since you mention using a trie, I guess it's just a big alternative of
fixed strings. You may want to try using the Aho-Corasick variant. It
looks like there are several implementations (google finds at least
two). I would be surprised if any pure python solution were faster than
tries implemented in C. Don't forget to tell us your findings.

-- Alain.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,743
Latest member
WoodrowMea

Latest Threads

Top