Parser as an alternative to RegExen

S. Robert James · Feb 22, 2007

I'm parsing a large file, currently using compound regexen:

PREAMBLE = 'AA'
USERID = '\d{8}'
USER_HELLO = "#{PREMABLE}(#{USERID})"

Is there a simple way to do this using a parser such as ANTLR? I've
never used one before, so if it requires a learning curve, I'll stick
to my regexen.

But if there is a cleaner way to do this, I'd certainly like to.

James Edward Gray II · Feb 22, 2007

I'm parsing a large file, currently using compound regexen:

PREAMBLE = 'AA'
USERID = '\d{8}'
USER_HELLO = "#{PREMABLE}(#{USERID})"

Is there a simple way to do this using a parser such as ANTLR? I've
never used one before, so if it requires a learning curve, I'll stick
to my regexen.

I really don't think there's any value in going all the way to a
parser generator here. This job looks to be squarely in the Regexp
domain, so there's no reason to feel bad about using them.

James Edward Gray II

Logan Capaldo · Feb 22, 2007

I really don't think there's any value in going all the way to a
parser generator here. This job looks to be squarely in the Regexp
domain, so there's no reason to feel bad about using them.

Agreed.

OTOH, Parsers are sure fun to write! (esp. rec descent ones for simple
grammars).

If you do decide to go with a parser generator, check out Dhaka,
http://dhaka.rubyforge.org/

Robert Klemme · Feb 22, 2007

I really don't think there's any value in going all the way to a parser
generator here. This job looks to be squarely in the Regexp domain, so
there's no reason to feel bad about using them.

Agree. Also, in Ruby Regexp objects can nicely be used to build larger
expressions because Regexp#to_s is nicely implemented to retain all the
settings:

irb(main):001:0> PREAMBLE = /AA/
=> /AA/
irb(main):002:0> USERID = /\d{8}/
=> /\d{8}/
irb(main):003:0> USER_HELLO = /#{PREAMBLE}(#{USERID})/
=> /(?-mix:AA)((?-mix:\d{8}))/

That way you can make sure that all sub expressions are valid and you
can nicely mix options - if you need to (for example, preamble case
insensitive).

Kind regards

robert

Ola Bini · Feb 22, 2007

S. Robert James said:
I'm parsing a large file, currently using compound regexen:

PREAMBLE = 'AA'
USERID = '\d{8}'
USER_HELLO = "#{PREMABLE}(#{USERID})"

Is there a simple way to do this using a parser such as ANTLR? I've
never used one before, so if it requires a learning curve, I'll stick
to my regexen.

But if there is a cleaner way to do this, I'd certainly like to.

As other people has mentioned, there is no biggie using Regexps for
this. BUT, another approach which I deem really nice is to use Ragel.
Ragel is a generator for Finite State Machines which recently got a
backend for Ruby (it's only in version control yet).

The regexps would look almost the same, but the speed would be increase
greatly.

--
Ola Bini (http://ola-bini.blogspot.com)
JvYAML, RbYAML, JRuby and Jatha contributor
System Developer, Karolinska Institutet (http://www.ki.se)
OLogix Consulting (http://www.ologix.com)

"Yields falsehood when quined" yields falsehood when quined.

David Vallner · Feb 22, 2007

I'm parsing a large file, currently using compound regexen:

PREAMBLE =3D 'AA'
USERID =3D '\d{8}'
USER_HELLO =3D "#{PREMABLE}(#{USERID})"

Is there a simple way to do this using a parser such as ANTLR? I've
never used one before, so if it requires a learning curve, I'll stick
to my regexen.

But if there is a cleaner way to do this, I'd certainly like to.

One instance where I'd be thinking of picking up parser-fu would be if t=
he =

data contains recursively nested structures of some sort. Either the =

regexes, or the ancillary code juggling them gets hairy anyway, losing y=
ou =

the simplicity, and you still have to work your way through the nesting =
=

levels manually, which an AST parser would do for you.

David Vallner

MIME::Lite alternative?	1	Feb 3, 2013
Alternative Ruby grammar	22	Nov 16, 2007
How to get all values of an object	1	Mar 26, 2022
Im having trouble containing a title an image and a button	2	Oct 26, 2022
CSS Grid. Im having trouble containing a title an image and a button	1	Oct 25, 2022
[Mac OS X] Ruby as an alternative to Applescript article	2	Mar 3, 2007
Parser to read configuration file	14	May 13, 2010
How to write a language parser ?	5	Feb 22, 2013

Parser as an alternative to RegExen

S. Robert James

James Edward Gray II

Logan Capaldo

Robert Klemme

Ola Bini

David Vallner

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads