regex questions

J

Jeff Davis

In python the regexes allow you to call a function instead of just
substitute the values (see <http://docs.python.org/lib/node111.html> for
more details). That seems quite useful, is there something similar in ruby?

Also, let's say I want match anything between "a" and "b" unless it
contains the word "foo". I could write two regexes like so:

if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/

Is there a good way to make that kind of logic into one regex? Is there
some kind of "intersect" operator or a "not" operator?

Regards,
Jeff Davis
 
A

Assaph Mehr

Jeff said:
In python the regexes allow you to call a function instead of just
substitute the values (see <http://docs.python.org/lib/node111.html> for
more details). That seems quite useful, is there something similar in ruby?

Also, let's say I want match anything between "a" and "b" unless it
contains the word "foo". I could write two regexes like so:

if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/

Is there a good way to make that kind of logic into one regex? Is there
some kind of "intersect" operator or a "not" operator?


You can't write executable code within the regex, but...

When constructing the regex you can include calls that will be
evaluated, just like double-quote string:

start_tag = 'a'
end_tag = 'b'
r = /#{start_tag}(.*?)#{end_tag}/ #=> /a(.*?)b/

Also, if you call sub/gsub you can pass a block. E.g.

s1 = 'xxx a x foo x b xxx'
r = /a(.*?)b/
puts s1.sub(r) { |match| match =~ /foo/ ? '' : match } #=> 'xxx xxx'
HTH,
Assaph
 
A

Andrew Johnson

In python the regexes allow you to call a function instead of just
substitute the values (see <http://docs.python.org/lib/node111.html> for
more details). That seems quite useful, is there something similar in ruby?

Yes, #sub and #gsub can be passed a block to be evaluated for
the replacement value.
Also, let's say I want match anything between "a" and "b" unless it
contains the word "foo". I could write two regexes like so:

if str =~ /a(.*)b/ and str !~ /a(.*foo.*)b/

Careful, consider this string:

str = 'afoolbalmnbaopbafoob'

which has two sets of a..b that don't contain 'foo', but your
pairing rejects it because a..foo..b can also be found.

Is there a good way to make that kind of logic into one regex? Is there
some kind of "intersect" operator or a "not" operator?

A fairly standard way is to use negative look-ahead and inch
ahead one character at a time like:

re = %r{a((?:(?!foo).)*?)b}
str = 'afoolbalmnbaopbafoob'
str.scan(re).each do |m|
p m
end
__END__
["lmn"]
["op"]

regards,
andrew
 
D

David A. Black

Hi --

You can't write executable code within the regex, but...

I don't think it will help Jeff's case, but in general you certainly
can include code in a regex if you want to:

irb(main):001:0> puts "match" if /abc#{gets.chomp}/.match("abcdef")
def
match


David
 
J

Jacob Fugal

A fairly standard way is to use negative look-ahead and inch
ahead one character at a time like:

re = %r{a((?:(?!foo).)*?)b}

Thank you, oh thank you! I don't know how many times I've been told
that something like this was possible, but neither I nor the "guru"
who told me could make it work. You have my eternal gratitude!

Jacob Fugal
 
J

Jeff Davis

Jacob said:
Thank you, oh thank you! I don't know how many times I've been told
that something like this was possible, but neither I nor the "guru"
who told me could make it work. You have my eternal gratitude!

Jacob Fugal
Another way is kind of complicated, but it works. Let's say that you
want to match a string like:
if str =~ /a(.*)b/ and str !~ /a(.*xyz.*)b/

then you can instead do:
if str =~ /[^a]*a([^bx]|x[^by]|xy[^bz])*(b|xb|xyb)/

[ I changed to 'xyz' from 'foo' to show what's going on in the regex
better ]

It's nice to have one regex like that, but you can see that it gets
complicated and hard to read, especially as the string you're avoiding
(in this case xyz) turns into a complicated regex.

Technically, you can build any regular expression with only "()", "|"
and "*" (and of course concatenation, which is just two expressions next
to eachother, no operator is needed). Andrew's is much more readable,
however.

Regards,
Jeff Davis

Note: I know I answered my own question. I did a little research about
regexes first. Thanks Andrew for the negative-lookahead thing, that's
what I was looking for.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,166
Messages
2,570,901
Members
47,442
Latest member
KevinLocki

Latest Threads

Top