search reg-exp for exact match

J

John Butler

Hi,

I have a regular expression
/\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
and i want to check if various years are present.

"2003" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
returns 0 as expected

"2010" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
returns nil as expected

But i want only exact matches so when i search for "2003 - 2008" i want
nil returned as there is no exact match for that particular string. I
thought the \b would give me this but it doesnt.

"2003 - 2008" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
returns 0 i want nil returned.

Anyone help?

Jb
 
J

Jesús Gabriel y Galán

Hi,

I have a regular expression
/\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
and i want to check if various years are present.

"2003" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
returns 0 as expected

"2010" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
returns nil as expected

But i want only exact matches so when i search for "2003 - 2008" i want
nil returned as there is no exact match for that particular string. I
thought the \b would give me this but it doesnt.

"2003 - 2008" =~ /\b2003|\2004|\2005|\2006|\2007|\2008|\2009\b/
returns 0 i want nil returned.

Anyone help?

To do exactly what you are asking for: you can anchor the regexp
to the beggining or end of the string:

irb(main):013:0> re = /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
=> /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
irb(main):014:0> "2003" =~ re
=> 0
irb(main):015:0> "2003 - 2008" =~ re
=> nil

In this case you don't need the \b anymore. BTW, you had typos there
because you had \2 instead of \b.
Anyway, if you want exact matches of strings you don't need regexps:

irb(main):018:0> years = (2003..2009).map {|x| x.to_s}
=> ["2003", "2004", "2005", "2006", "2007", "2008", "2009"]
irb(main):020:0> years.include? "2003"
=> true
irb(main):021:0> years.include? "2003 - 2008"
=> false

If you have many numbers and many lookups, a Set should be better,
performance-wise.
Now, if we are talking about ranges of years we can do even better:

irb(main):022:0> min_year = 2003
=> 2003
irb(main):023:0> max_year = 2009
=> 2009
irb(main):024:0> year_to_test = "2003".to_i
=> 2003
irb(main):025:0> min_year <= year_to_test and year_to_test <= max_year
=> true
irb(main):026:0> year_to_test = "2008".to_i
=> 2008
irb(main):027:0> min_year <= year_to_test and year_to_test <= max_year
=> true
irb(main):028:0> year_to_test = "2010".to_i
=> 2010
irb(main):029:0> min_year <= year_to_test and year_to_test <= max_year
=> false


Hope this helps,

Jesus.
 
R

Robert Klemme

To do exactly what you are asking for: you can anchor the regexp
to the beggining or end of the string:

irb(main):013:0> re = /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
=> /\A(2003|2004|2005|2006|2007|2008|2009)\Z/
irb(main):014:0> "2003" =~ re
=> 0
irb(main):015:0> "2003 - 2008" =~ re
=> nil

I'd rather use /\A200[3-9]\z/.
In this case you don't need the \b anymore. BTW, you had typos there
because you had \2 instead of \b.
Anyway, if you want exact matches of strings you don't need regexps:

irb(main):018:0> years = (2003..2009).map {|x| x.to_s}
=> ["2003", "2004", "2005", "2006", "2007", "2008", "2009"]
irb(main):020:0> years.include? "2003"
=> true
irb(main):021:0> years.include? "2003 - 2008"
=> false

Or

irb(main):001:0> s="2005"
=> "2005"
irb(main):002:0> (2003..2009) === s[/\A\d{4}\z/].to_i
=> true
irb(main):003:0> s="2010"
=> "2010"
irb(main):004:0> (2003..2009) === s[/\A\d{4}\z/].to_i
=> false
irb(main):006:0> (2003..2009).include? s[/\A\d{4}\z/].to_i
=> false

Note, this works because 0 (= nil.to_i) is not part of the range!
If you have many numbers and many lookups, a Set should be better,
performance-wise.

You can even use a bit set:

irb(main):007:0> t = (2003..2009).inject(0) {|mask,y| mask | 1 << y}
=>
116650078639864259662055853239652489576667478532211432368528502061497852157464823887836603809757037023714110007321126217782227286423686421672874625786531963635756068971637276480699799614611885589371789821904502024698121311064730577770474098457113815634439476503092997189887743679313284635928742849521858004245675611528209841692017556564840683843349732924435866760173931843810360262352061792429448169450281904579322760817054128336138506834410834183565543664844525391283837108127106791786643268532096672079466512393065631776802367002142967381057920196424747178242497261636008255151052901022379808767413846016
irb(main):008:0> t[s.to_i]
=> 0
irb(main):009:0> s="2005"
=> "2005"
irb(main):010:0> t[s.to_i]
=> 1
irb(main):011:0>

There are many ways... :)
Now, if we are talking about ranges of years we can do even better:

... or use the range test (as above) directly.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top