Parser as an alternative to RegExen

S

S. Robert James

I'm parsing a large file, currently using compound regexen:

PREAMBLE = 'AA'
USERID = '\d{8}'
USER_HELLO = "#{PREMABLE}(#{USERID})"

Is there a simple way to do this using a parser such as ANTLR? I've
never used one before, so if it requires a learning curve, I'll stick
to my regexen.

But if there is a cleaner way to do this, I'd certainly like to.
 
J

James Edward Gray II

I'm parsing a large file, currently using compound regexen:

PREAMBLE = 'AA'
USERID = '\d{8}'
USER_HELLO = "#{PREMABLE}(#{USERID})"

Is there a simple way to do this using a parser such as ANTLR? I've
never used one before, so if it requires a learning curve, I'll stick
to my regexen.

I really don't think there's any value in going all the way to a
parser generator here. This job looks to be squarely in the Regexp
domain, so there's no reason to feel bad about using them.

James Edward Gray II
 
L

Logan Capaldo

I really don't think there's any value in going all the way to a
parser generator here. This job looks to be squarely in the Regexp
domain, so there's no reason to feel bad about using them.
Agreed.

OTOH, Parsers are sure fun to write! (esp. rec descent ones for simple
grammars).

If you do decide to go with a parser generator, check out Dhaka,
http://dhaka.rubyforge.org/
 
R

Robert Klemme

I really don't think there's any value in going all the way to a parser
generator here. This job looks to be squarely in the Regexp domain, so
there's no reason to feel bad about using them.

Agree. Also, in Ruby Regexp objects can nicely be used to build larger
expressions because Regexp#to_s is nicely implemented to retain all the
settings:

irb(main):001:0> PREAMBLE = /AA/
=> /AA/
irb(main):002:0> USERID = /\d{8}/
=> /\d{8}/
irb(main):003:0> USER_HELLO = /#{PREAMBLE}(#{USERID})/
=> /(?-mix:AA)((?-mix:\d{8}))/

That way you can make sure that all sub expressions are valid and you
can nicely mix options - if you need to (for example, preamble case
insensitive).

Kind regards

robert
 
O

Ola Bini

S. Robert James said:
I'm parsing a large file, currently using compound regexen:

PREAMBLE = 'AA'
USERID = '\d{8}'
USER_HELLO = "#{PREMABLE}(#{USERID})"

Is there a simple way to do this using a parser such as ANTLR? I've
never used one before, so if it requires a learning curve, I'll stick
to my regexen.

But if there is a cleaner way to do this, I'd certainly like to.


As other people has mentioned, there is no biggie using Regexps for
this. BUT, another approach which I deem really nice is to use Ragel.
Ragel is a generator for Finite State Machines which recently got a
backend for Ruby (it's only in version control yet).

The regexps would look almost the same, but the speed would be increase
greatly.

--
Ola Bini (http://ola-bini.blogspot.com)
JvYAML, RbYAML, JRuby and Jatha contributor
System Developer, Karolinska Institutet (http://www.ki.se)
OLogix Consulting (http://www.ologix.com)

"Yields falsehood when quined" yields falsehood when quined.
 
D

David Vallner

I'm parsing a large file, currently using compound regexen:

PREAMBLE =3D 'AA'
USERID =3D '\d{8}'
USER_HELLO =3D "#{PREMABLE}(#{USERID})"

Is there a simple way to do this using a parser such as ANTLR? I've
never used one before, so if it requires a learning curve, I'll stick
to my regexen.

But if there is a cleaner way to do this, I'd certainly like to.

One instance where I'd be thinking of picking up parser-fu would be if t=
he =

data contains recursively nested structures of some sort. Either the =

regexes, or the ancillary code juggling them gets hairy anyway, losing y=
ou =

the simplicity, and you still have to work your way through the nesting =
=

levels manually, which an AST parser would do for you.

David Vallner
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,234
Messages
2,571,179
Members
47,811
Latest member
GregoryHal

Latest Threads

Top