Comparison of parsers in python?

Simon Forman · Sep 21, 2009

This is more a less just a list of parsers. I would like some detailed
guidelines on which one to choose for various parsing problems.

Regards,
Peng

Check out http://navarra.ca/?p=538

(FWIW I like and use SPARK, but I haven't used or tried many others.)

Nobody · Sep 21, 2009

I did a google search and found various parser in python that can be
used to parse different files in various situation. I don't see a page
that summarizes and compares all the available parsers in python, from
simple and easy-to-use ones to complex and powerful ones.

I am wondering if somebody could list all the available parsers and
compare them.

I have a similar question.

What I want: a tokeniser generator which can take a lex-style grammar (not
necessarily lex syntax, but a set of token specifications defined by
REs, BNF, or whatever), generate a DFA, then run the DFA on sequences of
bytes. It must allow the syntax to be defined at run-time.

What I don't want: anything written by someone who doesn't understand the
field (i.e. anything which doesn't use a DFA).

greg · Sep 22, 2009

Nobody said:
What I want: a tokeniser generator which can take a lex-style grammar (not
necessarily lex syntax, but a set of token specifications defined by
REs, BNF, or whatever), generate a DFA, then run the DFA on sequences of
bytes. It must allow the syntax to be defined at run-time.

You might find my Plex package useful:

http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/

It was written some time ago, so it doesn't know about
the new bytes type yet, but it shouldn't be hard to
adapt it for that if you need to.

What I don't want: anything written by someone who doesn't understand the
field (i.e. anything which doesn't use a DFA).

Plex uses a DFA.

Nobody · Sep 25, 2009

You might find my Plex package useful:

http://www.cosc.canterbury.ac.nz/greg.ewing/python/Plex/

I haven't had time to play around with this yet, but it appears to be
essentially what I'm looking for.

andrew cooke · Sep 26, 2009

I have a similar question.

What I want: a tokeniser generator which can take a lex-style grammar (not
necessarily lex syntax, but a set of token specifications defined by
REs, BNF, or whatever), generate a DFA, then run the DFA on sequences of
bytes. It must allow the syntax to be defined at run-time.

What I don't want: anything written by someone who doesn't understand the
field (i.e. anything which doesn't use a DFA).

lepl will do this, but it's integrated with the rest of the parser
(which is recursive descent).

for example:

float = Token(Float())
word = Token(Word(Lower())
punctuation = ~Token(r'[\.,]')

line = (float | word)[:, punctuation]
parser = line.string_parser()

will generate a lexer with three tokens. here two are specified using
lepl's matchers and one using a regexp, but in all three cases they
are converted to dfas internally.

then a parser is generated that will match a sequence of floats and
words, separated by punctuation. spaces are discarded by the lexer by
default, but that can be changed through the configuration (which
would be passed to the string_parser method).

it's also possible to specify everything using matchers and then get
lepl to compile "as much as possible" of the matcher graph to nfas
before matching (nfas rather than dfas because they are implemented
with a stack to preserve the backtracking abilities of the recursive
descent parser they replace). the problem here is that not all
matchers can be converted (matchers can contain arbitrary python
functions, while my nfa+dfa implementations cannot, and also my
"compiler" isn't very smart), while using tokens explicitly gives you
an error if the automatic compilation fails (in which case the simple
fix is to just give the regexp).

(also, you say "sequence of bytes" rather than strings - lepl will
parse the byte[] type in python3 and even has support for matching
binary values).

disclaimer: newish library, python 2.6+ only, and while i have quite a
few users (or, at least, downloads), i doubt that many use these more
advanced features, and everything is pure python with little
performance tuning so far.

andrew

Picture Comparison Code Not Working Properly	1	Jul 24, 2021
SAXReaderNotAvailble: No parsers found	1	Aug 31, 2008
data structure for ASTs in Python-written parsers	0	Feb 14, 2009
PEP/GSoC idea: built-in parser generator module for Python?	0	Mar 14, 2014
Did you know that there is a match-case function in python?	4	Dec 17, 2023
Perl-python regex-performance comparison	5	Mar 3, 2009
Perl / python regex / performance comparison	2	Mar 3, 2009
Perl python - regex performance comparison	1	Mar 3, 2009

Comparison of parsers in python?

Simon Forman

Nobody

greg

Nobody

andrew cooke

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads