PHP Parser for Java..?

P

pek

OK, I believe I looked enough on the web (google.com, irc.freenode.net,
google groups etc. etc.).
I can't seem to find a single PHP Parser that I can use in Java.
I have downloaded Apache Xerces but I believe it is not relevant (or
I'm missing something).

Anyway, to make it easier, this is what I want *exactly*:

I'm developing a small program that parsers HTML files, gets only the
PHP code and tries to recognize all PHP functions that use a PHP
library.

Example:
<html>
<?php
mysql_connect("localhost","pek","peker");
?>
</html>

The program will parse the PHP code, recognize that "mysql_connect(*)"
is a function, tries to find in which library the function belongs from
an XML file which has a list of libraries and associated functions
(this is where Xerces comes) and outputs an array of Strings with these
informations: CodeLine, Function, Library etc.

I simply need a PHP Parser.
Any help/ideas/tips/suggestions/etc..?
Thank you for your time.
 
M

Matt Rose

pek said:
OK, I believe I looked enough on the web (google.com, irc.freenode.net,
google groups etc. etc.).
I can't seem to find a single PHP Parser that I can use in Java.
I have downloaded Apache Xerces but I believe it is not relevant (or
I'm missing something).

Anyway, to make it easier, this is what I want *exactly*:

I'm developing a small program that parsers HTML files, gets only the
PHP code and tries to recognize all PHP functions that use a PHP
library.

Example:
<html>
<?php
mysql_connect("localhost","pek","peker");
?>
</html>

The program will parse the PHP code, recognize that "mysql_connect(*)"
is a function, tries to find in which library the function belongs from
an XML file which has a list of libraries and associated functions
(this is where Xerces comes) and outputs an array of Strings with these
informations: CodeLine, Function, Library etc.

I simply need a PHP Parser.
Any help/ideas/tips/suggestions/etc..?
Thank you for your time.

Have a look at antlr http://www.antlr.org/ and javacc
https://javacc.dev.java.net/ then either toss a coin, take your pick,
or try them both.

Matt
 
M

Moiristo

Matt said:
Have a look at antlr http://www.antlr.org/ and javacc
https://javacc.dev.java.net/ then either toss a coin, take your pick,
or try them both.

I think it is unnecessary to create a complete php parser in java just
to filter out the functions it contains. A simple ad-hoc algorithm would
suffice. Just get all the php blocks from the page, and then use
Pattern/Matcher to find all functions it contains by creating a regex.
 
D

Dave Glasser

-0700 in comp.lang.java.programmer:

Have a look at antlr http://www.antlr.org/ and javacc
https://javacc.dev.java.net/ then either toss a coin, take your pick,
or try them both.

In an upcoming project, I'm going to have to use one of these tools
for parsing mathematical expressions, and I have no experience with
either of them. I'd be interested in feedback from someone who's used
both. Is one significantly better than the other, or is a coin toss in
fact as good a way as any for choosing one over the other?



--
Check out QueryForm, a free, open source, Java/Swing-based
front end for relational databases.

http://qform.sourceforge.net

If you're a musician, check out RPitch Relative Pitch
Ear Training Software.

http://rpitch.sourceforge.net
 
M

Moiristo

Dave said:
-0700 in comp.lang.java.programmer:



In an upcoming project, I'm going to have to use one of these tools
for parsing mathematical expressions, and I have no experience with
either of them. I'd be interested in feedback from someone who's used
both. Is one significantly better than the other, or is a coin toss in
fact as good a way as any for choosing one over the other?

There is a course here at the university to learn how to build a
compiler using Antlr. I don't know about javaCC, but Antlr was quite
easy to understand and you could very easily create a compiler for a
custom language; you only need basic java and (E)BNF knowledge.
 
?

=?ISO-8859-2?Q?Dra=BEen_Gemi=E6?=

This is not strictly on topic, but there is a php interpreter in Resin.
I think it is in open source part, so it might help.

DG
 
P

Paul Cager

Moiristo said:
There is a course here at the university to learn how to build a
compiler using Antlr. I don't know about javaCC, but Antlr was quite
easy to understand and you could very easily create a compiler for a
custom language; you only need basic java and (E)BNF knowledge.

I've always found JavaCC easy to use (well, as easy as a compiler
generator can be...). JavaCC is a top-down recursive descent tool which
feels similar to a hand-crafted parser. I believe the best introduction
to JavaCC is:

http://www.engr.mun.ca/~theo/JavaCC-FAQ/

I've never used ANTLR, but it seems very similar to JavaCC. I must
reluctantly concede that ANTLR seems to be better documented.
 
M

maaxiim

pek wrote:
Anyway, to make it easier, this is what I want *exactly*:

I'm developing a small program that parsers HTML files, gets only the
PHP code and tries to recognize all PHP functions that use a PHP
library.
I simply need a PHP Parser.
Any help/ideas/tips/suggestions/etc..?
Thank you for your time.

Try searching on krugle.com, enter 'php parser' in the search box and
select Java as the language. the very first hit I got was 'jhp' which
after a superficial scan, appears to do something along the lines of
what you require.

regards

maaxiim
 
O

Oliver Wong

Dave Glasser said:
-0700 in comp.lang.java.programmer:



In an upcoming project, I'm going to have to use one of these tools
for parsing mathematical expressions, and I have no experience with
either of them. I'd be interested in feedback from someone who's used
both. Is one significantly better than the other, or is a coin toss in
fact as good a way as any for choosing one over the other?

Go with ANTLR. AFAIK, the JavaCC community is dead, but ANTLR is still
being actively developed, and the lead developer participates on the mailing
list and is open to future design suggestions.

- Oliver
 
P

Paul Cager

Oliver said:
Go with ANTLR. AFAIK, the JavaCC community is dead, but ANTLR is
still being actively developed, and the lead developer participates on
the mailing list and is open to future design suggestions.

- Oliver

Well, JavaCC is still being developed (e.g. there is the Java 1.5
version). But I agree that it is less "active" than ANTLR.
 
E

EJP

Dave said:
In an upcoming project, I'm going to have to use one of these tools
for parsing mathematical expressions, and I have no experience with
either of them. I'd be interested in feedback from someone who's used
both. Is one significantly better than the other, or is a coin toss in
fact as good a way as any for choosing one over the other?

If you're really just parsing mathematical expressions, either is
considerable overkill. A tiny recursive-descent parser will be quite
adequate.
 
P

Paul Cager

EJP said:
If you're really just parsing mathematical expressions, either is
considerable overkill. A tiny recursive-descent parser will be quite
adequate.

Good point. In fact, do you need a parser at all; would a scripting
engine like BeanShell work?
 
D

Dave Glasser

If you're really just parsing mathematical expressions, either is
considerable overkill. A tiny recursive-descent parser will be quite
adequate.

Which one do you recommend?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top