Help with a regexp

Daniel Schierbeck · Jul 12, 2006

I'm trying to write a regular expression that matches bencoded strings,
i.e. strings on the form x:y, where x is the numeric length of y.

This is valid:

6:foobar

while this is not:

4:foo

I've tried using #{$1} inside the regexp, but it seems $1 is still nil
at that point. Here's what I've got so far:

/^([1-9]+\d*)

\w){#{$1}}$/

(it ain't working)

I'm matching with `string =~ regexp'; would it be better if I did it
another way?

Cheers,
Daniel

Pit Capitain · Jul 12, 2006

Daniel said:
I'm trying to write a regular expression that matches bencoded strings,
i.e. strings on the form x:y, where x is the numeric length of y.

This is valid:

6:foobar

while this is not:

4:foo

I've tried using #{$1} inside the regexp, but it seems $1 is still nil
at that point. Here's what I've got so far:

/^([1-9]+\d*)\w){#{$1}}$/

(it ain't working)

I'm matching with `string =~ regexp'; would it be better if I did it
another way?

Daniel, I'm pretty sure you can't do it with a single regexp test. Why
not split the test in two parts, as in

str =~ /^([1-9]\d*)

\w+)$/ && $2.size == $1.to_i

Regards,
Pit

studlee2 · Jul 12, 2006

Daniel,
When you grab the data it will be in a string format, so you need
to convert it to a number (most likely integer). Then you can compare
it with the size of the second value you grabbed. I would write it
like this (modify as needed):

print "enter data"
a = gets
valid = /^(\d*)

\w*)/

check = valid.match(a)

if(check[1].to_i == check[2].size)

<Your code here>

end

I hope this helps.

_Steve

Logan Capaldo · Jul 12, 2006

I'm trying to write a regular expression that matches bencoded
strings, i.e. strings on the form x:y, where x is the numeric
length of y.

This is valid:

6:foobar

while this is not:

4:foo

I've tried using #{$1} inside the regexp, but it seems $1 is still
nil at that point. Here's what I've got so far:

/^([1-9]+\d*)\w){#{$1}}$/

(it ain't working)

I'm matching with `string =~ regexp'; would it be better if I did
it another way?

Cheers,
Daniel

I believe that this is not in the set of languages a Regexp can
match. As others have already suggested you'll have to do it in 2 steps.

Daniel Schierbeck · Jul 12, 2006

Daniel,
When you grab the data it will be in a string format, so you need
to convert it to a number (most likely integer). Then you can compare
it with the size of the second value you grabbed. I would write it
like this (modify as needed):

print "enter data"
a = gets
valid = /^(\d*)\w*)/

check = valid.match(a)

if(check[1].to_i == check[2].size)

<Your code here>

end

Thank you for your reply.

My problem is that I have a string like this: "3:foo6:monkey5:sheep",
which I need to separate into ["foo", "monkey", "sheep"]. The values can
contain numeric values, so splitting at \d won't work. This is what
makes it difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read the
next number, etc.

Cheers,
Daniel

dblack · Jul 12, 2006

Hi --

My problem is that I have a string like this: "3:foo6:monkey5:sheep", which I
need to separate into ["foo", "monkey", "sheep"]. The values can contain
numeric values, so splitting at \d won't work. This is what makes it
difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read the next
number, etc.

How do you know, in the case of:

2:ab23:cat

which of the two is invalid?

David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
http://www.manning.com/black => RUBY FOR RAILS, the Ruby book for
Rails developers
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
(e-mail address removed) => me

Marcin MielÅ¼yÅ„ski · Jul 12, 2006

Daniel said:
My problem is that I have a string like this: "3:foo6:monkey5:sheep",
which I need to separate into ["foo", "monkey", "sheep"]. The values can
contain numeric values, so splitting at \d won't work. This is what
makes it difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read the
next number, etc.

"3:foo6:monkey5:sheep".scan(/(\d+)\

[^\d]+)/){|(num,str)|
if num.to_i == str.length
# correct
else
# not correct
end
}

lopex

Logan Capaldo · Jul 12, 2006

Hi --

My problem is that I have a string like this:
"3:foo6:monkey5:sheep", which I need to separate into ["foo",
"monkey", "sheep"]. The values can contain numeric values, so
splitting at \d won't work. This is what makes it difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then
read the next number, etc.

Click to expand...

How do you know, in the case of:

2:ab23:cat

which of the two is invalid?

David

I don't believe you do. OTOH I'm not sure if you should care. If
one's invalid, the whole thing is invalid.

Anyway, here's my solution:

% cat n_colon_s.rb
require 'strscan'
def parse_n_colon_s(s)
scanner = StringScanner.new(s)
results = []
until scanner.empty?
if scanner.scan(/(\d+):/)
n = scanner[1].to_i
raise 'Malformed String' unless (s = scanner.scan(/\w{#{n}}/))
results << s
else
raise 'Malformed string'
end
end
results
end

p parse_n_colon_s("2:ab3:cat")
p parse_n_colon_s("2:ab23:cat")

% ruby n_colon_s.rb
["ab", "cat"]
-:18:in `parse_n_colon_s': Malformed String (RuntimeError)
from -:28

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
http://www.manning.com/black => RUBY FOR RAILS, the Ruby book for
Rails developers
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
(e-mail address removed) => me

Tom Werner · Jul 12, 2006

Daniel said:
Daniel,
When you grab the data it will be in a string format, so you need
to convert it to a number (most likely integer). Then you can compare
it with the size of the second value you grabbed. I would write it
like this (modify as needed):

print "enter data"
a = gets
valid = /^(\d*)\w*)/

check = valid.match(a)

if(check[1].to_i == check[2].size)

<Your code here>

end

Click to expand...

Thank you for your reply.

My problem is that I have a string like this: "3:foo6:monkey5:sheep",
which I need to separate into ["foo", "monkey", "sheep"]. The values
can contain numeric values, so splitting at \d won't work. This is
what makes it difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read
the next number, etc.

Cheers,
Daniel

require 'strscan'

array = []
ss = StringScanner.new('3:ab23:cat5:sheep')

while !ss.eos?
len = ss.scan(/\d+:/).chop.to_i
array << ss.peek(len)
ss.pos += len
end

p array

=> ["ab2", "cat", "sheep"]

Tom

--
Tom Werner
Helmets to Hardhats
Software Developer
(e-mail address removed)
www.helmetstohardhats.org

Phrogz · Jul 12, 2006

require 'strscan'
s = StringScanner.new( "3:ab23:cat5:sheep" )
words = []
until s.eos?
if digits = s.scan( /\d+/ )
digits = digits.to_i
s.pos += 1
words << s.peek( digits )
s.pos += digits
else
p words
abort "Couldn't find digits for: #{s.rest}"
end
end
p words
#=> ["ab2", "cat", "sheep"]

dblack · Jul 12, 2006

Hi --

Hi --

My problem is that I have a string like this: "3:foo6:monkey5:sheep",
which I need to separate into ["foo", "monkey", "sheep"]. The values can
contain numeric values, so splitting at \d won't work. This is what makes
it difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read the
next number, etc.

Click to expand...

How do you know, in the case of:

2:ab23:cat

which of the two is invalid?

David

Click to expand...

I don't believe you do. OTOH I'm not sure if you should care. If one's
invalid, the whole thing is invalid.

OK, but how do you know whether or not this is valid?

3:ab23:cat

David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
http://www.manning.com/black => RUBY FOR RAILS, the Ruby book for
Rails developers
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
(e-mail address removed) => me

ara.t.howard · Jul 12, 2006

Hi --

My problem is that I have a string like this: "3:foo6:monkey5:sheep", which
I need to separate into ["foo", "monkey", "sheep"]. The values can contain
numeric values, so splitting at \d won't work. This is what makes it
difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read the
next number, etc.

Click to expand...

How do you know, in the case of:

2:ab23:cat

which of the two is invalid?

the second. it's invalid here

2:ab23:cat
^
^

virtual position 10.

harp:~ > cat a.rb
require 'strscan'

class List < ::Array
def initialize s
parse s
end
class ParseError < ::StandardError; end
def parse s
clear
ss = StringScanner.new s.to_s
until ss.eos?
n = ss.scan %r/^\d+:/
raise ParseError, "syntax error beginning @ pos <#{ ss.pos }>" unless n

n = n.to_i
raise RangeError, "bad length <#{ n }>" if n < 0

word = ss.scan(%r/.{0,#{ n }}/).to_s

raise ParseError, "syntax error beginning @ pos <#{ ss.pos }>" unless
word.size == n

self << word
end
end
end

list = List.new "3:ab23:cat5:sheep"
p list

list = List.new "2:ab23:cat"
p list

harp:~ > ruby a.rb
["ab2", "cat", "sheep"]
a.rb:20:in `parse': syntax error beginning @ pos <10> (List:

arseError)
from a.rb:5:in `initialize'
from a.rb:31

regards.

-a

ara.t.howard · Jul 12, 2006

OK, but how do you know whether or not this is valid?

3:ab23:cat

it is valid.

harp:~ > cat a.rb
require 'strscan'

class List < ::Array
def initialize s
parse s
end
class ParseError < ::StandardError; end
def parse s
clear
ss = StringScanner.new s.to_s
until ss.eos?
n = ss.scan %r/^\d+:/
raise ParseError, "syntax error beginning @ pos <#{ ss.pos }>" unless n
n = n.to_i
raise RangeError, "bad length <#{ n }>" if n < 0
word = ss.scan(%r/.{0,#{ n }}/).to_s
raise ParseError, "syntax error beginning @ pos <#{ ss.pos }>" unless
word.size == n
self << word
end
end
end

list = List.new "3:ab23:cat5:sheep"
p list

list = List.new "3:ab23:cat"
p list

harp:~ > ruby a.rb
["ab2", "cat", "sheep"]
["ab2", "cat"]

regards.

-a

Phrogz · Jul 12, 2006

Here's what I consider a slightly cleaner, more robust version:

require 'strscan'
inp = "3:ab23:cat5:sheep"
s = StringScanner.new( inp )
words = []
until s.eos?
begin
unless digits = s.scan( /\d+(?=

/ )
raise "I can't find an integer followed by a colon"
end
s.pos += 1 # skip the colon we know is there
digits = digits.to_i
unless s.rest_size >= digits
raise "I ran out of characters; looking for #{digits} characters,
#{s.rest_size} left"
end
words << s.peek( digits )
s.pos += digits
rescue RuntimeError => e
warn "Looking at #{s.rest.inspect},"
warn e.message
abort "Words found so far: #{words.inspect}"
end
end
puts "Words found: ", words

dblack · Jul 12, 2006

Hi --

Hi --

My problem is that I have a string like this: "3:foo6:monkey5:sheep",
which I need to separate into ["foo", "monkey", "sheep"]. The values can
contain numeric values, so splitting at \d won't work. This is what makes
it difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read the
next number, etc.

Click to expand...

How do you know, in the case of:

2:ab23:cat

which of the two is invalid?

Click to expand...

the second. it's invalid here

2:ab23:cat
^
^

virtual position 10.

How do you know? The first one could be wrong and the second one
right.

David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
http://www.manning.com/black => RUBY FOR RAILS, the Ruby book for
Rails developers
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
(e-mail address removed) => me

dblack · Jul 12, 2006

Hi --

it is valid.

I guess what I mean is: how do you know whether it's valid by
coincidence but actually wrong?

David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
http://www.manning.com/black => RUBY FOR RAILS, the Ruby book for
Rails developers
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
(e-mail address removed) => me

ara.t.howard · Jul 12, 2006

How do you know? The first one could be wrong and the second one
right.

because that's the specification:

so, in this case, the parse follows these steps:

- read '2:'
- read 2 chars
- read '23:'
- read 23 chars -> ERROR

i think you are confusing the syntax of the structure for the the logic of it.
syntax wise '2:ab' is totally valid and so that token pair is consumed. of
course it's the case that the user meant

3:ab23:cat

it's not up to parsers to detect logic errors, only syntax errors. in this
case the left to right quality (no backtracking specified) makes it
unambiguous as to which token contains the error - it's the second.

yaml has similar issues, for instance this yaml document

k : v
- 0
- 2
- 3

loads as

{"k"=>[0, 2, 3]}

which is obviously not what is meant. but is it

- k : v
- 0
- 2
- 3

or

k :
- 0
- 2
- 3

both are one char errors. nevertheless it's too much to expect yaml to
determine which was meant or suspect that an error was made.

regards.

-a

ara.t.howard · Jul 12, 2006

Hi --

I guess what I mean is: how do you know whether it's valid by
coincidence but actually wrong?

the same way ruby knows that this

a = false and must_call_this_function_but_do_not!

is valid but actually wrong -- it doesn't! that's why logic errors are the
worst kind to debug...

fortunately for us too - otherwise we programmers would be out of work!

;-)

-a

dblack · Jul 12, 2006

Hi --

the same way ruby knows that this

a = false and must_call_this_function_but_do_not!

is valid but actually wrong -- it doesn't! that's why logic errors are the
worst kind to debug...

fortunately for us too - otherwise we programmers would be out of work!

I think what I'm groping toward (or not) is a case where backtracking
is necessary before the conclusion is drawn that it's invalid. I know
backtracking wasn't specified... but I wonder whether there are cases
that can't be resolved otherwise. (Just thinking out loud, so to
speak.)

David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
http://www.manning.com/black => RUBY FOR RAILS, the Ruby book for
Rails developers
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
(e-mail address removed) => me

Daniel Schierbeck · Jul 12, 2006

Phrogz said:
Here's what I consider a slightly cleaner, more robust version:

require 'strscan'
inp = "3:ab23:cat5:sheep"
s = StringScanner.new( inp )
words = []
until s.eos?
begin
unless digits = s.scan( /\d+(?=/ )
raise "I can't find an integer followed by a colon"
end
s.pos += 1 # skip the colon we know is there
digits = digits.to_i
unless s.rest_size >= digits
raise "I ran out of characters; looking for #{digits} characters,
#{s.rest_size} left"
end
words << s.peek( digits )
s.pos += digits
rescue RuntimeError => e
warn "Looking at #{s.rest.inspect},"
warn e.message
abort "Words found so far: #{words.inspect}"
end
end
puts "Words found: ", words

Thanks for the help!

I ended up doing it slightly different (after a lot of pain, and a sneek
peek at a Python script.):

This is an early library that handles bencoding. Note that the error
handling is pretty much non-existent at this point, and it'll break if
you fart on it.

class String
def bencode
"#{length}:#{self}"
end
end

class Numeric
def bencode
"i#{floor}e"
end
end

class Array
def bencode
"l#{map{|obj| obj.bencode}}e"
end
end

class Hash
def bencode
"d#{map{|key, val| [key.bencode, val.bencode]}}e"
end
end

module Bencode
@filter = {:integer => /(0|-?[1-9]+\d*)e/,
:string => /(0|[1-9][0-9]*):/}

class << self
attr_reader :map

def dump(obj)
obj.bencode
end

def load(str)
decode(str, 0).first
end

def decode_integer(data, i)
match = @filter[:integer].match(data[i..-1])
raise ArgumentError if match.nil?
return Integer(match[1]), match.end(0) + i
end

def decode_string(data, i)
match = @filter[:string].match(data[i..-1])
raise ArgumentError if match.nil?
length = Integer(match[1])
offset = match.end(0) + length + i
return data[(match.end(0) + i)...offset], offset
end

def decode_list(data, i)
ary = []
while data[i..i] != "e"
val, i = decode(data, i)
ary << val
end
return ary, i + 1
end

def decode_dict(data, i)
hsh = {}
while data[i..i] != "e"
key, i = decode_string(data, i)
val, i = decode(data, i)
hsh[key] = val
end
return hsh, i + 1
end

def decode(data, i)
case data[i..i]
when "i"
decode_integer(data, i + 1)
when "l"
decode_list(data, i + 1)
when "d"
decode_dict(data, i + 1)
else
decode_string(data, i)
end
end
end
end

Eventually I'll upload it to RubyForge. If you've got comments, bring
'em on, but remember that I only just started this today.

Cheers,
Daniel

Help with code plsss	0	Aug 30, 2023
RegExp - Match specific words, but not if they're inside parenthesis (with or without other words within)	6	Jan 29, 2023
Help With a Script	5	Jul 10, 2021
help with regexp	5	Feb 7, 2013
Need help with a script... (my first!)	6	Sep 24, 2022
Simple regexp question	0	Oct 26, 2005
Issue with regexp pattern matcher withing String#gsub	9	Apr 29, 2009
RegExp	2	Oct 19, 2006

Help with a regexp

Daniel Schierbeck

Pit Capitain

studlee2

Logan Capaldo

Daniel Schierbeck

dblack

Marcin MielÅ¼yÅ„ski

Logan Capaldo

Tom Werner

Phrogz

dblack

ara.t.howard

ara.t.howard

Phrogz

dblack

dblack

ara.t.howard

ara.t.howard

dblack

Daniel Schierbeck

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads