How to write a language parser ?

T

Timothy Madden

Hello

I am trying to write a DBGp client in python, to be used for debugging
mostly php scripts.

Currently the XDebug module for php allows me to set breakpoints on any
line, include blank ones and lines that are not considered executable,
resulting in breakpoints that will never be hit, even if program flow
control appears to pass through the lines.

For that I would like to write a php parser, in order to detect the
proper breakpoints line for statements spanning multiple lines.

Is there an (open-source) way to do to this in python code ? Most
parsers I could see after a search are either too simple for a real
programming language, or based on a python module written in C. My debug
client is a Vim plugin, and I would like to distribute it as script
files only, if that is possible. The generator itself my well be a C
module, as I only distribute the generated output.

The best parser I could find is PLY, and I would like to know if it is
good enough for the job. My attempt at a bison parser (C only) ended in
about a hundred conflicts, most of which are difficult to understand,
although I admit I do not know much about the subject yet.

Are there other parsers you have used for complete languages ?

Thank you,
Timothy Madden
 
C

Chris Angelico

For that I would like to write a php parser, in order to detect the proper
breakpoints line for statements spanning multiple lines.

Are you able to drop to PHP itself for that? It makes its own lexer
available to user-code:

http://php.net/manual/en/function.token-get-all.php

It's supposed to be able to tell you line numbers, too, though I
haven't actually used that. In theory, you should be able to use
token_get_all, then JSON encode it, and write the whole lot out to
stdout, where Python can pick it up and work with it.

ChrisA
 
M

mbg1708

Hello



I am trying to write a DBGp client in python, to be used for debugging

mostly php scripts.



Currently the XDebug module for php allows me to set breakpoints on any

line, include blank ones and lines that are not considered executable,

resulting in breakpoints that will never be hit, even if program flow

control appears to pass through the lines.



For that I would like to write a php parser, in order to detect the

proper breakpoints line for statements spanning multiple lines.



Is there an (open-source) way to do to this in python code ? Most

parsers I could see after a search are either too simple for a real

programming language, or based on a python module written in C. My debug

client is a Vim plugin, and I would like to distribute it as script

files only, if that is possible. The generator itself my well be a C

module, as I only distribute the generated output.



The best parser I could find is PLY, and I would like to know if it is

good enough for the job. My attempt at a bison parser (C only) ended in

about a hundred conflicts, most of which are difficult to understand,

although I admit I do not know much about the subject yet.



Are there other parsers you have used for complete languages ?



Thank you,

Timothy Madden

Take a look at this whitepaper:
http://www.cis.upenn.edu/~matuszek/General/recursive-descent-parsing.html

I needed a parser for a chunk of SQL syntax. After trying PyBison and writing crude text analysis in Python, I found this very useful paper. I used the advice in this paper to write my own recursive descent parser in pure Python. The two steps were:

1. Write a yacc syntax (without any action items). This step allowed me to get rid of various shift and reduce conflicts in my grammar.

2. Use the yacc grammar as a guide for the recursive descent parser. Essentially I wrote one parser function in Python for each yacc production.

The process has the merit that the yacc syntax is known to be robust before you start coding, so the eventual Python code is based on a good design.

Good luck.
 
M

Mark Lawrence

Hello

I am trying to write a DBGp client in python, to be used for debugging
mostly php scripts.

Currently the XDebug module for php allows me to set breakpoints on any
line, include blank ones and lines that are not considered executable,
resulting in breakpoints that will never be hit, even if program flow
control appears to pass through the lines.

For that I would like to write a php parser, in order to detect the
proper breakpoints line for statements spanning multiple lines.

Is there an (open-source) way to do to this in python code ? Most
parsers I could see after a search are either too simple for a real
programming language, or based on a python module written in C. My debug
client is a Vim plugin, and I would like to distribute it as script
files only, if that is possible. The generator itself my well be a C
module, as I only distribute the generated output.

The best parser I could find is PLY, and I would like to know if it is
good enough for the job. My attempt at a bison parser (C only) ended in
about a hundred conflicts, most of which are difficult to understand,
although I admit I do not know much about the subject yet.

Are there other parsers you have used for complete languages ?

Thank you,
Timothy Madden

http://nedbatchelder.com/text/python-parsers.html
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,701
Latest member
XavierQ83

Latest Threads

Top