[ANN] Regex Searching on Arbitrary Sequences

M

Michael Edgar

Good day Rubyists,

I've just finished a write-up on an interesting problem: using Ruby's =
Regexp engine
to search arbitrary sequences of potentially heterogenous objects. It's =
based on the
more specific instance used in Ripper in 1.9. I've packaged it into a =
gem though it is
a bit rough around the edges.

The post can be found here: =
http://carboni.ca/blog/p/Regex-Search-on-Arbitrary-Sequences

And the gem can be found here: =
https://github.com/michaeledgar/object_regex

The gem requires Ruby 1.9+.

Cheers,
Mike Edgar
http://carboni.ca/=
 
J

John Carter

[Note: parts of this message were removed to make it a legal post.]

Very Nice!

It's a very handy and powerful technique. And I really like the way you have
generalized it to "all objects"

I have used that technique twice before...

Once decades ago I used it to recognised the longest possible straight line
sequence of pixel boundaries to do raster to vector conversion in a exact
and near optimal manner.

More recently I "meta'd" it to step up a lexer to a grammar parser with
LittleLexer. http://littlelexer.rubyforge.org/



Good day Rubyists,

I've just finished a write-up on an interesting problem: using Ruby's
Regexp engine
to search arbitrary sequences of potentially heterogenous objects. It's
based on the
more specific instance used in Ripper in 1.9. I've packaged it into a gem
though it is
a bit rough around the edges.

The post can be found here:
http://carboni.ca/blog/p/Regex-Search-on-Arbitrary-Sequences

And the gem can be found here:
https://github.com/michaeledgar/object_regex

The gem requires Ruby 1.9+.

Cheers,
Mike Edgar
http://carboni.ca/




--
John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : (e-mail address removed)
New Zealand





--
John Carter Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : (e-mail address removed)
New Zealand

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.
=======================================================================
 
J

Jörg W Mittag

Michael said:
I've just finished a write-up on an interesting problem: using Ruby's Regexp engine
to search arbitrary sequences of potentially heterogenous objects. It's based on the
more specific instance used in Ripper in 1.9. I've packaged it into a gem though it is
a bit rough around the edges.

The post can be found here: http://carboni.ca/blog/p/Regex-Search-on-Arbitrary-Sequences

And the gem can be found here: https://github.com/michaeledgar/object_regex

This is pretty cool. I never understood why pretty much every language
except Erlang artificially restricts Regexps to text. (Erlang also
allows regular-expression-like pattern matching on bit strings.)

Functional languages and increasingly also modern OO languages (e.g.
Newspeak) have structural pattern matching over arbitrary types, but
without the parsing feature of Regexps (alternation, repetition, ...).
Scripting languages have Regexps but only over text strings, not
arbitrary types.

What I *really* would like to see is the union of pattern matching and
Regexps, ranging over arbitrary types. Unfortunately, I don't have the
slightest idea what that would like.

jwm
 
M

Michael Edgar

In talking it over with the co-writer of the Regex-Searching writeup, we =
think
that with a bit of massaging of the existing code, defining meaningful =
#reg_desc
methods on Array, Class, Hash, and Object could get a good part of the =
way there.
=3D> Class
=3D> nil
ObjectRegex.new('Fixnum String+ Regexp?').all_matches([1, 'hi', 2, 3, =
4, 'world', 'there', /abc/])
=3D> [[1, "hi"], [4, "world", "there", /abc/]]

The syntax is a bit restrictive in the current version of object_regex, =
but I came up with this
quickly for tuple searching:
=3D> nil
ObjectRegex.new('Array_String_Fixnum_+').match([ ['string', /regex/], =
['string2', 1], ['string3', 3], ['string4'] ])
=3D> [["string2", 1], ["string3", 3]]

I used a cautiously restrictive regex for picking the tokens out of the =
input pattern, but things like standard generics
syntax (Array<String>) could be possible.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,233
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top