Yet another RE question

B

Bogdan Marinescu

Hello all,

First I want to apologize if this was already discussed before, I can't find an answer anywhere right now. I'm writing a simple compiler for a small language using Spark (http://pages.cpsc.ucalgary.ca/~aycock/spark/). And I just found out that the regular expressions in Python follow the Perl semantics (first-then-longest) instead of the POSIX semantics (longest match). This is quite annoying for me; while some solutions to this problem exists and they are shown in the Spark documentation, I have some background with lex/yacc and I would really like to use the "lex" semantics (longest match). Is there a package for Python that implements this behaviour?
Thank you,

Bogdan
 
C

Christos TZOTZIOY Georgiou

[snip: this is about a simple compiler of a small language, and Python
follows Perl re symantics instead of POSIX: a|ab always matches 'a' even
if 'ab' would match in the search string]
This is quite annoying for me; while some solutions to this problem exists and they are shown in the Spark documentation, I have some background with lex/yacc and I would really like to use the "lex" semantics (longest match). Is there a package for Python that implements this behaviour?

AFAIK no, there is no such package. However, you can do the following
things:

- reorder alternations (sp?) to be longest first

Substitute "ab|a" for "a|ab"

- the (?!...) operator might help

The re "if(?![a-z_0-9])" would match the 'if' and would ignore all
identifiers starting with 'if'.

HTH.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,175
Messages
2,570,942
Members
47,490
Latest member
Finplus

Latest Threads

Top