[QUIZ] Regexp.build() (#4)

Ruby Quiz · Oct 15, 2004

The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.grayproductions.net/ruby_quiz/

3. Enjoy!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

There's been some discussion on Ruby Talk lately about Range.member? which tests
if a given element (often a number) is a member of the set the Range object
iterates over. Obviously, this kind of test is useful in many aspects of
programming, but let's approach this problem from a different angle.

This week's quiz is to build a library that adds a class method called build()
to Regexp. build() should accept a variable number of arguments which can
include integers and ranges of integers. Have build() return a Regexp object
that will match only integers in the set of passed arguments.

Here are some examples of possible usage:

lucky = Regexp.build( 3, 7 )
"7" =~ lucky # => true
"13" =~ lucky # => false
"3" =~ lucky # => true

month = Regexp.build( 1..12 )
"0" =~ month # => false
"1" =~ month # => true
"12" =~ month # => true
day = Regexp.build( 1..31 )
"6" =~ day # => true
"16" =~ day # => true
"Tues" =~ day # => false
year = Regexp.build( 98, 99, 2000..2005 )
"04" =~ year # => false
"2004" =~ year # => true
"99" =~ year # => true

num = Regexp.build( 0..1_000_000 )
"-1" =~ num # => false

Some issues you may want to consider while building you're library:

* How should leading zeros be handled?

Match the hour from a clock formatted in military time (0 to 23). Hours 0
through 9 may or may not have a single leading zero.

* Should anything be captured by the returned Regexp?

* How should anchoring work?

"2004" =~ Regexp.build( 4 ) # => ???

Jamis Buck · Oct 17, 2004

--------------080605000108030807000709
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

So, according to my calculations, 48+ hours have elapsed.

Thus, here's my solution to Regexp.build(). I assumed the following:

1) leading zeros are accepted (ie, "0004" =~ Regexp.build(4) matches)

2) nothing is captured (besides the match itself)

3) "2004" !~ Regexp.build(4)

- Jamis

Ruby said:
The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.grayproductions.net/ruby_quiz/

3. Enjoy!

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

There's been some discussion on Ruby Talk lately about Range.member? which tests
if a given element (often a number) is a member of the set the Range object
iterates over. Obviously, this kind of test is useful in many aspects of
programming, but let's approach this problem from a different angle.

This week's quiz is to build a library that adds a class method called build()
to Regexp. build() should accept a variable number of arguments which can
include integers and ranges of integers. Have build() return a Regexp object
that will match only integers in the set of passed arguments.

Here are some examples of possible usage:

lucky = Regexp.build( 3, 7 )
"7" =~ lucky # => true
"13" =~ lucky # => false
"3" =~ lucky # => true

month = Regexp.build( 1..12 )
"0" =~ month # => false
"1" =~ month # => true
"12" =~ month # => true
day = Regexp.build( 1..31 )
"6" =~ day # => true
"16" =~ day # => true
"Tues" =~ day # => false
year = Regexp.build( 98, 99, 2000..2005 )
"04" =~ year # => false
"2004" =~ year # => true
"99" =~ year # => true

num = Regexp.build( 0..1_000_000 )
"-1" =~ num # => false

Some issues you may want to consider while building you're library:

* How should leading zeros be handled?

Match the hour from a clock formatted in military time (0 to 23). Hours 0
through 9 may or may not have a single leading zero.

* Should anything be captured by the returned Regexp?

* How should anchoring work?

"2004" =~ Regexp.build( 4 ) # => ???

.

--
Jamis Buck
(e-mail address removed)
http://www.jamisbuck.org/jamis

--------------080605000108030807000709
Content-Type: text/plain;
name="regexp.rb"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="regexp.rb"

class Regexp

class NumericRegexpBuilder
def initialize
@patterns = []
end

def add_pattern( pattern )
@patterns << pattern
end

alias :<< :add_pattern

def to_regexp
Regexp.new( "(?:^|[^-])\\b0*(?:" + @patterns.map{|p| "(?:#{p})"}.join( "|" ) + ")\\b" )
end
end

def self.build( *parms )
raise ArgumentError, "expected at least one parameter" if parms.empty?

builder = NumericRegexpBuilder.new
parms.each do |parm|
case parm
when Numeric
builder << parm
when Range
parm.each { |i| builder << i }
else
raise ArgumentError,
"unsupported parm type #{parm.class} (#{parm.inspect})"
end
end

return builder.to_regexp
end

end

if $0 == __FILE__
require 'test/unit'

class TC_Regexp < Test::Unit::TestCase

def test_build_none
assert_raise( ArgumentError ) do
Regexp.build
end
end

def test_build_one_integer
re = Regexp.build( 5 )
assert_match re, "5"
assert_match re, "!5!"
assert_match re, "!00005,"
assert_no_match re, "15"
assert_no_match re, "52"
end

def test_build_multiple_integers
re = Regexp.build( 5, 7, 15 )
assert_match re, "5"
assert_match re, "!5!"
assert_match re, "!00005,"
assert_match re, "015"
assert_match re, "007"
assert_no_match re, "52"
assert_no_match re, "57"
assert_no_match re, "070"
end

def test_build_one_range
re = Regexp.build( 0..100 )
assert_match re, "000"
assert_match re, "052"
assert_match re, "15,32"
assert_match re, "100"
assert_no_match re, "777"
assert_no_match re, "101"
end

def test_build_multiple_ranges
re = Regexp.build( 0..10, 20...35, 71..77 )
assert_match re, "000"
assert_match re, "34"
assert_match re, "000072"
assert_no_match re, "11"
assert_no_match re, "35"
end

def test_mix_and_match
re = Regexp.build( 0, 5, 10..15, 17, 21, 31...35, 70...100 )
assert_match re, "0"
assert_match re, "005"
assert_match re, "012"
assert_no_match re, "22"
assert_no_match re, "35"
assert_no_match re, "100"
end

def test_negative
re = Regexp.build( 0..5 )
assert_no_match re, "-1"
re = Regexp.build( -5..5 )
assert_no_match re, "-1"
end

end
end

--------------080605000108030807000709--

James Edward Gray II · Oct 18, 2004

So, according to my calculations, 48+ hours have elapsed.

Thus, here's my solution to Regexp.build(). I assumed the following:

My solution is pretty different and admittedly only so, so in
functionality.

My main idea was to treat all passed parameters as character data.
This solves the leading zeros problem by letting you pass things like
(1..60, "01".."09"). In addition, this approach also allows you to
pass non-numerical data, though that wasn't part of the quiz.

The other main point of my implementation was to not anchor at all.
This may make built Regexps less convenient to use, but by allowing you
to embed them in other patterns it greatly increases usability. For
example, if you would like to allow for arbitrary leading zeros, you
just embed the result of build() in another Regexp object with a
leading "0*". You can use embedding to provide whatever anchoring you
need, setup your own captures, or even to combine several built Regexp
objects.

Well, all that is how I intended this to work. It even gets close at
times. <laughs> Unfortunately, my character collapsing system (to
regex character classes) is dog slow and only works correctly on
numerical data. Put simply, my library makes the quiz's (1..1_000_000)
example impractical in build time. If I had it to do over, I would
approach this part of the problem from a completely different angle.
This is the one I built to throw away, as the saying goes.

I'll post my library below, and then my unit tests, which probably
better convey what I was aiming for.

James Edward Gray II

#!/usr/bin/env ruby

class Regexp
def self.build( *nums )
nums = nums.map { |e| Array(e) }.flatten.map { |e| String(e) }
nums = nums.sort_by { |e| [-e.length, e] }

patterns = [ ]
while nums.size > 0
eq, nums = nums.partition { |e| e.length == nums[0].length }
patterns.push(*build_char_classes( eq ))
end

/(?:#{patterns.join("|")})/
end

private

def self.build_char_classes( eq_len_strs )
results = [ ]

while eq_len_strs.size > 1
first = eq_len_strs.shift
if md = /^([^\[]*)([^\[])(.*)$/.match(first)
chars = md[2]
matches, eq_len_strs = eq_len_strs.partition do |e|
e =~ /^#{md[1]}(.)#{Regexp.escape md[3]}$/ and chars << $1
end
if matches.size == 0
results << first
next
end

chars = build_short_class(chars.squeeze)
eq_len_strs << "#{md[1]}[#{chars}]#{md[3]}"
else
results << first
end
end
results << eq_len_strs[0] if eq_len_strs.size == 1

results
end

def self.build_short_class( char_class )
while md = /[^\-\0]{3,}/.match(char_class)
short = md[0][1..-1].split("").inject(md[0][0, 1]) do |mem, c|
if (mem.length == 1 or mem[-2] != ?-) and mem[-1, 1].succ == c
mem + "-" + c
elsif mem[-2, 2] =~ /-(.)/ and $1.succ == c
mem[0..-2] + c
else
mem + c
end
end
char_class.sub!(md[0], short.split("").join("\0"))
end

char_class.tr!("\0", "")
char_class.gsub!(/([^\-])-([^\-])/) do |m|
if $1.succ == $2 then $1 + $2 else m end
end
char_class
end
end

=== Unit Tests ===

#!/usr/bin/env ruby

# Usage: ruby -r regexp_build_lib $0

require "test/unit"

class TestRegexpBuild < Test::Unit::TestCase
def test_integers
lucky = /^#{Regexp.build(3, 7)}$/
assert_match(lucky, "7")
assert_no_match(lucky, "13")
assert_match(lucky, "3")

month = /^#{Regexp.build(1..12)}$/
assert_no_match(month, "0")
assert_match(month, "1")
assert_match(month, "12")
day = /^#{Regexp.build(1..31)}$/
assert_match(day, "6")
assert_match(day, "16")
assert_no_match(day, "Tues")
year = /^#{Regexp.build(98, 99, 2000..20005)}$/
assert_no_match(year, "04")
assert_match(year, "2004")
assert_match(year, "99")

num = /^#{Regexp.build(1..1_000)}$/
assert_no_match(num, "-1")
(-10_000..10_000).each do |i|
if i < 1 or i > 1_000
assert_no_match(num, i.to_s)
else
assert_match(num, i.to_s)
end
end
end

def test_embed
month = Regexp.build("01".."09", 1..12)
day = Regexp.build("01".."09", 1..31)
year = Regexp.build(95..99, "00".."05")
date = /\b#{month}\/#{day}\/(?:19|20)?#{year}\b/

assert_match(date, "6/16/2000")
assert_match(date, "12/3/04")
assert_match(date, "Today is 09/15/2004")
assert_no_match(date, "Fri Oct 15")
assert_no_match(date, "13/3/04")
assert_no_match(date, "There's no date hiding in here: 00/00/00!")

md = /^(#{Regexp.build(1..12)})$/.match("11")
assert_not_nil(md)
assert_equal(md[1], "11")
end

def test_words
animal = /^#{Regexp.build("cat", "bat", "rat", "dog")}$/
assert_match(animal, "cat")
assert_match(animal, "dog")
assert_no_match(animal, "Wombat")
end
end

Tanaka Akira · Oct 18, 2004

Ruby Quiz said:
There's been some discussion on Ruby Talk lately about Range.member? which tests
if a given element (often a number) is a member of the set the Range object
iterates over. Obviously, this kind of test is useful in many aspects of
programming, but let's approach this problem from a different angle.

This week's quiz is to build a library that adds a class method called build()
to Regexp. build() should accept a variable number of arguments which can
include integers and ranges of integers. Have build() return a Regexp object
that will match only integers in the set of passed arguments.

def Regexp.build(*args)
args = args.map {|arg| Array(arg) }.flatten.uniq.sort
neg, pos = args.partition {|arg| arg < 0 }
/\A(?:-0*#{Regexp.union(*neg.map {|arg| (-arg).to_s })}|0*#{Regexp.union(*pos.map {|arg| arg.to_s })})\z/
end

It use Regexp.union which is introduced at Ruby 1.8.1.

Since it expands all ranges, long ranges such as
Regexp.build( 0..1_000_000 ) works slowly, though.

Mark Hubbart · Oct 18, 2004

There's been some discussion on Ruby Talk lately about Range.member? which tests

if a given element (often a number) is a member of the set the Range object
iterates over. Obviously, this kind of test is useful in many aspects of
programming, but let's approach this problem from a different angle.

This week's quiz is to build a library that adds a class method called build()
to Regexp. build() should accept a variable number of arguments which can
include integers and ranges of integers. Have build() return a Regexp object
that will match only integers in the set of passed arguments.

Well, I came up with two solutions... But I wasn't able to complete
both of them. Here's the easier one; It splats ranges into arrays,
flattens the whole mess, makes sure they are all integers, uniqs them,
then joins them with pipes and embeds them in a regular expression. It
allows an arbitrary number of leading zeros, and anchors to the start
and end of the line. I didn't really spend any time on it, so I didn't
bother with negative values; I'm not sure what they would do.

def Regexp.build(*args)
# splat ranges into arrays of numbers, convert to integers,
# remove duplicates, and sort the list
numbers = args.map{|n| [*n] }.flatten.map{|n| n.to_i }
# create a range from the list of numbers
/^0*(?:#{numbers.uniq.join("|")})$/
end

My other solution was a lot better; I just got too frustrated trying
to make it. I kept thinking I had it working, then a border case would
pop up that didn't work properly. So, I'll submit the idea for
inspection, maybe someone will pick it up and run with it.

Here's the basic algorithm:

1. break the range up into regexp friendly sections, like this:
(23..1024) =>
23..29,
30..99,
100..999,
1000..1019,
1020..1024

2. convert each range into a string regexp:
23..29 => "2[3-9]"
30..99 => "[3-9]\\d"
100..999 => "[1-9]\\d\\d"
1000..1019 => "10[01]\\d"
1020..1024 => "102[0-4]"

3. join them all together
/^0*(?:2[3-9]|[3-9]\d|[1-9]\d\d|10[01]\d|102[0-4])$/

That gets the most compact regexp for each range. But I've been
beating my head against the wall getting it programmed. I'll keep
thinking about it, but...

Anyone?

cheers,
Mark

Mark Hubbart · Oct 18, 2004

Well, I came up with two solutions... But I wasn't able to complete
both of them.

Okay, I guess posting this to the list must have jostled my brain
around and knocked something loose... because soon after I posted
that, I figured out that I had been looking at it the wrong way. So,
look at the end of this email for my final version.

My stipulations:
1. anchored at the beginning and end (of the line).
2. an arbitrary number of leading zeros should be allowed
3. any integer range should be allowed, even those that are partially
or wholly negative

I decided that Ranges act like regexen quite often, so it could make
sense for there to be a #to_re method. So, I created a to_re method
that converts the range into a regular expression. The algorithm used
creates small regexps (compared to splatting and joining the ranges),
so things are much faster than in the brute force method. The actual
Regexp::build method just gives a Regexp union of the results, after
making regexen out of any integer arguments.

I defined the basic algorithm I was working on in my other email like this:

1. break the range up into regexp friendly sections, like this:
(23..1024) =>
23..29,
30..99,
100..999,
1000..1019,
1020..1024

2. convert each range into a string regexp:
23..29 => "2[3-9]"
30..99 => "[3-9]\\d"
100..999 => "[1-9]\\d\\d"
1000..1019 => "10[01]\\d"
1020..1024 => "102[0-4]"

3. join them all together
/^0*(?:2[3-9]|[3-9]\d|[1-9]\d\d|10[01]\d|102[0-4])$/

I got stuck on step one, trying to break up the range. I could tell it
needed to be done in two halves, as my code below shows; but I was
stuck on second half, counting up to the end of the range. I solved
that problem when I realized that I should count down to the middle,
just like I counted up. That sometimes left a middle section behind,
which gets turned into one last range and bundled in with the rest.

Negative ranges are supported by converting them into positive ranges,
then prefixing a - sign in the regexp. If a range spans zero, I split
it in two parts, then join the regexen.

Here's an example of a constructed regexp. Matching is practically
instantaneous, even for large ones:

huge = Regexp.build(0..928346798726)
==>/^(?-mix:0*\d|[1-9]\d|[1-9]\d\d|[1-9]\d\d\d|[1-9]\d\d\d\d|[1-9]\d\d\d\d\d|[1-9]\d\d\d\d\d\d|[1-9]\d\d\d\d\d\d\d|[1-9]\d\d\d\d\d\d\d\d|[1-9]\d\d\d\d\d\d\d\d\d|[1-9]\d\d\d\d\d\d\d\d\d\d|[1-8]\d\d\d\d\d\d\d\d\d\d\d|9[01]\d\d\d\d\d\d\d\d\d\d|92[0-7]\d\d\d\d\d\d\d\d\d|928[0-2]\d\d\d\d\d\d\d\d|9283[0-3]\d\d\d\d\d\d\d|92834[0-5]\d\d\d\d\d\d|928346[0-6]\d\d\d\d\d|9283467[0-8]\d\d\d\d|92834679[0-7]\d\d\d|928346798[0-6]\d\d|9283467987[01]\d|92834679872[0-6])$/

"2342234" =~ huge
==>0

The code follows:

def Regexp.build(*args)
ranges, numbers = args.partition{|item| Range === item}
re = ranges.map{|r| r.to_re } + numbers.map{|n| /0*#{n}/ }
/^#{Regexp.union(*re)}$/
end

class Range
def to_re
# normalize the range format; we want end inclusive, integer ranges
# this part passes the load off to a newly built range if needed.
if exclude_end?
return( (first.to_i..last.to_i - 1).to_re )
elsif ! (first + last).kind_of?(Integer)
return( (first.to_i .. last.to_i).to_re )
end

# Deal with ranges that are wholly or partially negative. If range is
# only partially negative, split in half and get two regexen. join them
# together for the finish. If the range is wholly negative, make it
# positive, then add a negative sign to the regexp
if first < 0 and last < 0
# return a negatized version of the regexp
return /-#{(-last..-first).to_re}/
elsif first < 0
neg = (first..-1).to_re
pos = (0..last).to_re
return /(?:#{neg}|#{pos})/
end

### First, create an array of new ranges that are more
### suited to regex conversion.

# a and z will be the remainders of the endpoints of the range
# as we slice it
a, z = first, last

# build the first part of the list of new ranges.
list1 = []
num = first
until num > z
a = num # recycle the value
# get the first power of ten that leaves a remainder
v = 10
v *= 10 while num % v == 0 and num != 0
# compute the next value up
num += v - num % v
# store the value unless it's too high
list1 << (a..num-1) unless num > z
end

# build the second part of the list; counting down.
list2 = []
num = last + 1
until num < a
z = num - 1 # recycle the value
# slice to the nearest power of ten
v = 10
v *= 10 while num % v == 0 and num != 0
# compute the next value down
num -= num % v
# store the value if it fits
list2 << (num..z) unless num < a
end
# get the chewey center part, if needed
center = a < z ? [a..z] : []
# our new list
list = list1 + center + list2.reverse

### Next, convert each range to a regexp.
list.map! do |rng|
a, z = rng.first.to_s, rng.last.to_s
a.split(//).zip(z.split(//)).map do |(f,l)|
case
when f == l then f
when f.to_i + 1 == l.to_i then "[%s%s]" % [f,l]
when f+l == "09" then "\\d"
else
"[%s-%s]" % [f,l]
end
end.join # returns the regexp for *that* range
end

### Last, return the final regexp
/0*#{ list.join("|") }/
end
end

Thanks for a great quiz! It really twisted my brain something fierce.

cheers,
Mark

James Edward Gray II · Oct 18, 2004

Thanks for a great quiz! It really twisted my brain something fierce.

You twisted mine too! Very nice work.

You're super clever solution unblocked my own mind and I was able to
build what I originally intended.

Feed it:

Regexp.build(1..1_000_000)

and it will return:

/\b0*(?:[1-9]|[1-9]\d|[1-9]\d\d|[1-9]\d\d\d|[1-9]\d\d\d\d|[1
-9]\d\d\d\d\d|1000000)\b/

Though there is a substantial wait involved.

This is what I was originally after.

James Edward Gray II

#!/usr/bin/env ruby

class Regexp
def self.build( *nums )
nums.sort! { |a, b| sort_chunks a, b }

patterns = [ ]
nums.each_index do |index|
if nums[index].kind_of? Range
nums[index].each do |n|
diff = compare_chunks(patterns[-1], n)
if diff == 1
patterns << combine(patterns.pop, String(n))
elsif diff != 0
patterns << String(n)
end
end
else
diff = compare_chunks(patterns[-1], nums[index])
if diff == 1
patterns << combine(patterns.pop, String(nums[index]))
elsif diff != 0
patterns << String(nums[index])
end
end
end

patterns.each { |e| e.gsub!(/\[([^\]]+)\]/) { shorten_char_class($1)
} }
/\b0*(?:#{patterns.join("|")})\b/
end

private

def self.combine( pat, str )
(0...str.length).each do |i|
if md = / ^( (?: [^\[\]] | \[[^\]]+\] ){#{i}} )
( [^\[\]] | \[[^\]]+\] ) (.*)$ /x.match(pat)
if str[i, 1] !~ /#{md[2]}/
new_pat = md[2][-1, 1] == "]" ?
"#{md[1]}#{md[2][0..-2] + str[i, 1]}]#{md[3]}" :
"#{md[1]}[#{md[2] + str[i, 1]}]#{md[3]}"
break new_pat
end
else
raise "Unexpected pattern format error: #{pat} !~ #{str}."
end
end
end

def self.compare_chunks( a, b )
return 2 if a.nil?

a = a.kind_of?(Range) ? String(a.first) : String(a)
b = b.kind_of?(Range) ? String(b.first) : String(b)

diff = 0
i = 0
while mda =
/^(?:[^\[\]]|\[[^\]]+\]){#{i}}([^\[\]]|\[[^\]]+\])/.match(a)
unless mdb = / ^(?: [^\[\]] | \[[^\]]+\] ){#{i}}
( [^\[\]] | \[[^\]]+\] ) /x.match(b)
return 2
end

if mda[1][-1, 1] == "]" and mdb[1][-1, 1] == "]"
return 2 if mda[1] != mdb[1]
elsif mda[1][-1, 1] == "]"
diff += 1 if mdb[1] !~ /#{mda[1]}/
elsif mdb[1][-1, 1] == "]"
diff += 1 if mda[1] !~ /#{mdb[1]}/
else
diff += 1 if mda[1] != mdb[1]
end
i += 1
end
if /^(?:[^\[\]]|\[[^\]]+\]){#{i}}([^\[\]]|\[[^\]]+\])/.match(b)
return 2
end
diff
end

def self.shorten_char_class( char_class )
char_class = char_class.split("").sort.join

return "\\d" if char_class == "0123456789"

while md = /[^\-\0]{3,}/.match(char_class)
short = md[0][1..-1].split("").inject(md[0][0, 1]) do |mem, c|
if (mem.length == 1 or mem[-2] != ?-) and mem[-1, 1].succ == c
mem + "-" + c
elsif mem[-2, 2] =~ /-(.)/ and $1.succ == c
mem[0..-2] + c
else
mem + c
end
end
char_class.sub!(md[0], short.split("").join("\0"))
end

char_class.tr!("\0", "")
char_class.gsub!(/([^\-])-([^\-])/) do |m|
if $1.succ == $2 then $1 + $2 else m end
end
"[#{char_class}]"
end

def self.sort_chunks( a, b )
a = a.kind_of?(Range) ? String(a.first) : String(a)
b = b.kind_of?(Range) ? String(b.first) : String(b)

return a.length - b.length if a.length != b.length

diff = 0
(0...a.length).each { |i| diff += 1 if a != b }
diff
end
end

Thomas Leitner · Oct 19, 2004

On Mon, 18 Oct 2004 17:02:30 +0900

| I defined the basic algorithm I was working on in my other email like
| this:
|
| > 1. break the range up into regexp friendly sections, like this:
| > (23..1024) =>
| > 23..29,
| > 30..99,
| > 100..999,
| > 1000..1019,
| > 1020..1024
| >
| > 2. convert each range into a string regexp:
| > 23..29 => "2[3-9]"
| > 30..99 => "[3-9]\\d"
| > 100..999 => "[1-9]\\d\\d"
| > 1000..1019 => "10[01]\\d"
| > 1020..1024 => "102[0-4]"
| >
| > 3. join them all together
| > /^0*(?:2[3-9]|[3-9]\d|[1-9]\d\d|10[01]\d|102[0-4])$/
|

I have had the same idea once my simple solution (using each integer in a range) failed with large ranges.

I assume the following:

- leading zeros are not accepted
- nothing is captured
- must match the whole line (anchored to start and end of line)

Output for this file on my machine:

[penguin 73] ~/work/rubyquiz > ruby 4_20041018_regexp.rb
Loaded suite 4_20041018_regexp
Started
.....
Finished in 0.021428 seconds.

5 tests, 108 assertions, 0 failures, 0 errors
[penguin 74] ~/work/rubyquiz >

So, better late than never, here is my code!

Thomas

ps. I tried to shorten the new Range methods as good as I could, can anything else be done??

------------------------------------------------------------------------
require 'test/unit/ui/console/testrunner'

class Integer

def to_rstr
"#{self}"
end

end

class Range

def get_regexps( a, b, negative = false )
arr = [a]

af = (a == 0 ? 1.0 : a.to_f)
bf = (b == 0 ? 1.0 : b.to_f)
1.upto( b.to_s.length-1 ) do |i|
pot = 10**i
num = (af/pot).ceil*(pot) # next higher number with i zeros
arr.insert( i, num ) if num < b
num = (bf/pot).floor*(pot) # next lower number with i zeros
arr.insert( -i, num )
end
arr.uniq!
arr.push( b+1 ) # +1 -> to handle it in the same way as the other elements

result = []
0.upto( arr.length - 2 ) do |i|
first = arr.to_s
second = (arr[i+1] - 1).to_s
str = ''
0.upto( first.length-1 ) do |j|
if first[j] == second[j]
str << first[j]
else
str << "[#{first[j].chr}-#{second[j].chr}]"
end
end
result << str
end

result = result.join('|')
result = "-(?:#{result})" if negative
result
end

def to_rstr
if first < 0 && last < 0
get_regexps( -last, -first, true )
elsif first < 0
get_regexps( 1, -first, true ) + "|" + get_regexps( 0, last )
else
get_regexps( first, last )
end
end

end

class Regexp

def self.build( *args )
Regexp.new("^(?:" + args.collect {|a| a.to_rstr}.flatten.uniq.join('|') + ")$" )
end

end

class RegexpTest < Test::Unit::TestCase

def rangeTest( first, last )
r = Regexp.build( first..last )
assert_match( r, "#{first}" )
assert_match( r, "#{(first + last)/2}" )
assert_match( r, "#{last}" )
assert_no_match( r, "#{first-1}" )
assert_no_match( r, "#{last+1}" )
end

def testBuild
lucky = Regexp.build( 3, 7 )
assert_match(lucky, "7")
assert_no_match(lucky, "13")
assert_match(lucky, "3")

rangeTest( 1, 12 )
month = Regexp.build( 1..12 )
assert_no_match(month, "0")
assert_match(month, "1")
assert_match(month, "12")

rangeTest( 1, 31 )
day = Regexp.build( 1..31 )
assert_match(day, "6")
assert_match(day, "16")
assert_no_match(day, "Tues")

rangeTest( 2000, 2005 )
year = Regexp.build( 98, 99, 2000..2005 )
assert_no_match(year, "04")
assert_match(year, "2004")
assert_match(year, "99")

rangeTest( 0, 1000000 )
num = Regexp.build( 0..1_000_000 )
assert_no_match(num, "-1")
end

def testPositive
rangeTest( 2, 10 )
rangeTest( 23432, 12312123 )
end

def testNegative
rangeTest( -10, -2 )
rangeTest( -15, 4 )
rangeTest( -100342, -343 )
end

def testOther
rangeTest( 5, 16 )
rangeTest( 10, 100 )
rangeTest( 11, 99 )
rangeTest( 1, 123456789 )
rangeTest( 10, 10 )
rangeTest( 5, 5 )
rangeTest( 0, 5 )
rangeTest( 0, 10 )
rangeTest( 1, 5 )
end

def testIllegal
num = Regexp.build( 1..12 )
assert_no_match( num, "012" )
assert_no_match( num, "A12" )
assert_no_match( num, "120" )
assert_no_match( num, "12A" )
assert_no_match( num, "3125" )
end

end

James Edward Gray II · Oct 20, 2004

ps. I tried to shorten the new Range methods as good as I could, can
anything else be done??

I'm not sure about shortening the methods Thomas, but it strikes me
that get_regexps() should be a class method. Just a suggestion.

James Edward Gray II

Taskcproblem calendar	4	Aug 31, 2023
Collect Excel Data from Website	5	Apr 30, 2022
[QUIZ] Hamurabi (#223)	20	Oct 31, 2009
[QUIZ] DictionaryMatcher (#103)	15	Nov 24, 2006
[QUIZ] Finding Quiz Responses (#200)	2	Apr 10, 2009
[QUIZ] URL Shenanigans (#220)	1	Sep 14, 2009
[QUIZ] Gathering Ruby Quiz 2 Data (#189)	10	Jan 23, 2009
[QUIZ] Music Theory (#229)	27	Feb 26, 2010

[QUIZ] Regexp.build() (#4)

Ruby Quiz

Jamis Buck

James Edward Gray II

Tanaka Akira

Mark Hubbart

Mark Hubbart

James Edward Gray II

Thomas Leitner

James Edward Gray II

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads