Parser suggestion

J

Jorge Godoy

Hi!


I'm needing a parser to retrieve some information from source code --
including parts of code -- from Fortran, to use in a project with a
documentation system.

Any recommendations on a Python app or parser that I could use for that?


Thanks,
 
M

Michael J. Fromberger

Jorge Godoy <[email protected]> said:
I'm needing a parser to retrieve some information from source code --
including parts of code -- from Fortran, to use in a project with a
documentation system.

Any recommendations on a Python app or parser that I could use for that?

There seems to be a great diversity of parsing tools available for
Python programmers. Here are a few suggestions to get you started:

PLY (Python Lex/Yacc)
http://www.dabeaz.com/ply/

PyParsing
http://pyparsing.sourceforge.net/

SPARK (Scanning Parsing And Rewriting Kit)
http://pages.cpsc.ucalgary.ca/~aycock/spark/

You might also find the following an interesting read, if this sort of
thing interests you:
http://www.python.org/sigs/parser-sig/towards-standard.html

Cheers,
-M
 
J

Jorge Godoy

Michael J. Fromberger said:
There seems to be a great diversity of parsing tools available for
Python programmers. Here are a few suggestions to get you started:

From Google I found almost all of those. But do you have any suggestion on
which one would be better to parse Fortran code? Or more productive to use
for this task?

This is new to me :)

From what I was seeing, this seems to be a good one to try...
SPARK (Scanning Parsing And Rewriting Kit)
http://pages.cpsc.ucalgary.ca/~aycock/spark/

It looks like it stopped being developed circa 2002... From 2002 to now
Python had a lot of improvements and I'd rather use a maintained tool for this
project. At least one that keeps up with Python's development...

You might also find the following an interesting read, if this sort of
thing interests you:
http://www.python.org/sigs/parser-sig/towards-standard.html

I will. But this is basically for "one project only". Other structures are
usually simpler than a programming language and can be retrieved with
different approaches.


Thank you very much for your suggestions.


Be seeing you,
 
S

Steven Bethard

Jorge said:
From Google I found almost all of those. But do you have any suggestion on
which one would be better to parse Fortran code? Or more productive to use
for this task?
[snip]

Well, I've never had to parse Fortan code, but I've had a lot of success
writing a variety of recursive grammars in PyParsing. I'd highly
recommend at least trying it out.

STeVe
 
?

=?iso-8859-1?Q?Fran=E7ois?= Pinard

[Jorge Godoy]
It looks like it stopped being developed circa 2002... From 2002 to
now Python had a lot of improvements and I'd rather use a maintained
tool for this project. At least one that keeps up with Python's
development...

While this way of thinking is in the fashion, and often happens to be
right, it does not really apply in the SPARK case.

SPARK works fine and well, and is probably the most elegant and pythonic
of the series. If it it does not really need to be further developed,
and does not have much to gain from recent Python releases, I do not see
why it should be released once in a while merely to entertain the crowd.
 
J

Jorge Godoy

Steven Bethard said:
Jorge said:
From Google I found almost all of those. But do you have any suggestion on
which one would be better to parse Fortran code? Or more productive to use
for this task? [snip]

Well, I've never had to parse Fortan code, but I've had a lot of success
writing a variety of recursive grammars in PyParsing. I'd highly recommend at
least trying it out.

STeVe

I've downloaded it. The interesting thing is that there are some examples
that parses things more complex than "2+3*8" :)
 
J

Jorge Godoy

François Pinard said:
[Jorge Godoy]
It looks like it stopped being developed circa 2002... From 2002 to
now Python had a lot of improvements and I'd rather use a maintained
tool for this project. At least one that keeps up with Python's
development...

While this way of thinking is in the fashion, and often happens to be
right, it does not really apply in the SPARK case.

SPARK works fine and well, and is probably the most elegant and pythonic
of the series. If it it does not really need to be further developed,
and does not have much to gain from recent Python releases, I do not see
why it should be released once in a while merely to entertain the crowd.

I don't consider it entertainment. Just code maintenance.

How can I be sure that if I find a bug I'll be able to discuss it with the
developer if it's 3 years since the last release of his code?

You're someone I admire and with whom I've worked before -- with documentation
--, so I consider your opinion a lot and will give SPARK a look based on your
recommendation.

But, I still think that 3 years without any new implementation, design change
or maintenance release is a huge ammount of time; specially in our area where
technology evolves really fast and new concepts are always popping up.


Thanks a lot for your opinion and for trying to open my eyes, François.
 
D

Dennis Lee Bieber

I'm needing a parser to retrieve some information from source code --
including parts of code -- from Fortran, to use in a project with a
documentation system.
FORTRAN has one of the most complex parsers around -- since (unless
recent standards have changed it) the language is not space delimited...
The following are IDENTICAL syntactically:

do 10 i = 1, 20, 3
do10i = 1,2 0,3

while the following is NOT a do loop

do 10 i = 1.2 03

And the only way to tell them apart is to parse up to the , or . --
and backtrack if you made the wrong choice (that is, if you assume "do
10 i =" started an assignment to "do10i", and found a "," you have to
backtrack and reparse as a loop).

And then consider (fixed pitch please):
C234567890
do
x 1
x0 =
x1
x.
C nonsense
x20
x 3



--
 
?

=?iso-8859-1?Q?Fran=E7ois?= Pinard

[Jorge Godoy]
You're someone [...]

You make me shy! :) Nevertheless, thanks for the appreciation! :)
How can I be sure that if I find a bug I'll be able to discuss it with
the developer if it's 3 years since the last release of his code?

SPARK is rock solid for me, and for the little doubts or improvements I
wanted, I remember having written to John Aycock, who always replied.
But of course, without having experienced it yourself, I understand that
one may have doubts. I also did not have any need for writing to Jonh
in a few years. In any case, SPARK is a single module file, and not
such a big one after all. I very slightly adapted it to my own needs
and habits, and merely copy it from project to project since then.

One warning is worth being told about SPARK. As it accepts a wider
variety of grammars, it uses algorithms that may be slow depending
on your grammar design, and may become slow when you have errors in
your source. Compromises are needed. In all cases I used it so far,
whenever the input to parse was sizeable, it was easy for me to split
the source in smaller chunks with boundaries recognisable by other
means, and calling the parser on each chunk instead of globally on the
whole thing. This yielded reasonable parsing speed in production code.

With SPARK, you have to provide a tokenizer. SPARK offers one based on
regular expressions as in Python, I found out I often prefer writing
my own instead. If you have to process FORTRAN code, you may have
some difficulty in this area (yet I may be all wrong by saying so, as
my FORTRAN is very rusty, and I did not keep up with the evolution of
FORTRAN standards). At a time in the past, FORTRAN did allow spurious,
ignored whitespace about everywhere outside strings, would it be within
identifiers. Whitespace can (could) also be spared between tokens where
it would have been clearly mandatory in any other language I know.

So, the split between the lexical and syntactic analysis for FORTRAN
is (or at least once was) fairly fuzzy. But this is theoretical. In
practice, FORTRAN programmers almost never resort to insane use of
whitespace. So you may probably resort to easier, standard two-level
analysis, rather than FORTRAN as formally defined, and still be winning!
 
S

suman.karumuri

Take a look at PLY.
There is an example lexer in the download for parsing fortran.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,266
Messages
2,571,342
Members
48,018
Latest member
DelilahDen

Latest Threads

Top