Help with a regexp

D

Daniel Schierbeck

I'm trying to write a regular expression that matches bencoded strings,
i.e. strings on the form x:y, where x is the numeric length of y.

This is valid:

6:foobar

while this is not:

4:foo

I've tried using #{$1} inside the regexp, but it seems $1 is still nil
at that point. Here's what I've got so far:

/^([1-9]+\d*):(\w){#{$1}}$/

(it ain't working)

I'm matching with `string =~ regexp'; would it be better if I did it
another way?


Cheers,
Daniel
 
P

Pit Capitain

Daniel said:
I'm trying to write a regular expression that matches bencoded strings,
i.e. strings on the form x:y, where x is the numeric length of y.

This is valid:

6:foobar

while this is not:

4:foo

I've tried using #{$1} inside the regexp, but it seems $1 is still nil
at that point. Here's what I've got so far:

/^([1-9]+\d*):(\w){#{$1}}$/

(it ain't working)

I'm matching with `string =~ regexp'; would it be better if I did it
another way?

Daniel, I'm pretty sure you can't do it with a single regexp test. Why
not split the test in two parts, as in

str =~ /^([1-9]\d*):(\w+)$/ && $2.size == $1.to_i

Regards,
Pit
 
S

studlee2

Daniel,
When you grab the data it will be in a string format, so you need
to convert it to a number (most likely integer). Then you can compare
it with the size of the second value you grabbed. I would write it
like this (modify as needed):

print "enter data"
a = gets
valid = /^(\d*):(\w*)/

check = valid.match(a)

if(check[1].to_i == check[2].size)

<Your code here>

end


I hope this helps.

_Steve
 
L

Logan Capaldo

I'm trying to write a regular expression that matches bencoded
strings, i.e. strings on the form x:y, where x is the numeric
length of y.

This is valid:

6:foobar

while this is not:

4:foo

I've tried using #{$1} inside the regexp, but it seems $1 is still
nil at that point. Here's what I've got so far:

/^([1-9]+\d*):(\w){#{$1}}$/

(it ain't working)

I'm matching with `string =~ regexp'; would it be better if I did
it another way?


Cheers,
Daniel

I believe that this is not in the set of languages a Regexp can
match. As others have already suggested you'll have to do it in 2 steps.
 
D

Daniel Schierbeck

Daniel,
When you grab the data it will be in a string format, so you need
to convert it to a number (most likely integer). Then you can compare
it with the size of the second value you grabbed. I would write it
like this (modify as needed):

print "enter data"
a = gets
valid = /^(\d*):(\w*)/

check = valid.match(a)

if(check[1].to_i == check[2].size)

<Your code here>

end

Thank you for your reply.

My problem is that I have a string like this: "3:foo6:monkey5:sheep",
which I need to separate into ["foo", "monkey", "sheep"]. The values can
contain numeric values, so splitting at \d won't work. This is what
makes it difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read the
next number, etc.


Cheers,
Daniel
 
D

dblack

Hi --

My problem is that I have a string like this: "3:foo6:monkey5:sheep", which I
need to separate into ["foo", "monkey", "sheep"]. The values can contain
numeric values, so splitting at \d won't work. This is what makes it
difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read the next
number, etc.

How do you know, in the case of:

2:ab23:cat

which of the two is invalid?


David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
http://www.manning.com/black => RUBY FOR RAILS, the Ruby book for
Rails developers
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
(e-mail address removed) => me
 
M

Marcin Mielżyński

Daniel said:
My problem is that I have a string like this: "3:foo6:monkey5:sheep",
which I need to separate into ["foo", "monkey", "sheep"]. The values can
contain numeric values, so splitting at \d won't work. This is what
makes it difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read the
next number, etc.

"3:foo6:monkey5:sheep".scan(/(\d+)\:([^\d]+)/){|(num,str)|
if num.to_i == str.length
# correct
else
# not correct
end
}

lopex
 
L

Logan Capaldo

Hi --

My problem is that I have a string like this:
"3:foo6:monkey5:sheep", which I need to separate into ["foo",
"monkey", "sheep"]. The values can contain numeric values, so
splitting at \d won't work. This is what makes it difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then
read the next number, etc.

How do you know, in the case of:

2:ab23:cat

which of the two is invalid?


David

I don't believe you do. OTOH I'm not sure if you should care. If
one's invalid, the whole thing is invalid.

Anyway, here's my solution:

% cat n_colon_s.rb
require 'strscan'
def parse_n_colon_s(s)
scanner = StringScanner.new(s)
results = []
until scanner.empty?
if scanner.scan(/(\d+):/)
n = scanner[1].to_i
raise 'Malformed String' unless (s = scanner.scan(/\w{#{n}}/))
results << s
else
raise 'Malformed string'
end
end
results
end

p parse_n_colon_s("2:ab3:cat")
p parse_n_colon_s("2:ab23:cat")


% ruby n_colon_s.rb
["ab", "cat"]
-:18:in `parse_n_colon_s': Malformed String (RuntimeError)
from -:28



--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
http://www.manning.com/black => RUBY FOR RAILS, the Ruby book for
Rails developers
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
(e-mail address removed) => me
 
T

Tom Werner

Daniel said:
Daniel,
When you grab the data it will be in a string format, so you need
to convert it to a number (most likely integer). Then you can compare
it with the size of the second value you grabbed. I would write it
like this (modify as needed):

print "enter data"
a = gets
valid = /^(\d*):(\w*)/

check = valid.match(a)

if(check[1].to_i == check[2].size)

<Your code here>

end

Thank you for your reply.

My problem is that I have a string like this: "3:foo6:monkey5:sheep",
which I need to separate into ["foo", "monkey", "sheep"]. The values
can contain numeric values, so splitting at \d won't work. This is
what makes it difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read
the next number, etc.


Cheers,
Daniel

require 'strscan'

array = []
ss = StringScanner.new('3:ab23:cat5:sheep')

while !ss.eos?
len = ss.scan(/\d+:/).chop.to_i
array << ss.peek(len)
ss.pos += len
end

p array

=> ["ab2", "cat", "sheep"]

Tom

--
Tom Werner
Helmets to Hardhats
Software Developer
(e-mail address removed)
www.helmetstohardhats.org
 
P

Phrogz

require 'strscan'
s = StringScanner.new( "3:ab23:cat5:sheep" )
words = []
until s.eos?
if digits = s.scan( /\d+/ )
digits = digits.to_i
s.pos += 1
words << s.peek( digits )
s.pos += digits
else
p words
abort "Couldn't find digits for: #{s.rest}"
end
end
p words
#=> ["ab2", "cat", "sheep"]
 
D

dblack

Hi --

Hi --

My problem is that I have a string like this: "3:foo6:monkey5:sheep",
which I need to separate into ["foo", "monkey", "sheep"]. The values can
contain numeric values, so splitting at \d won't work. This is what makes
it difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read the
next number, etc.

How do you know, in the case of:

2:ab23:cat

which of the two is invalid?


David

I don't believe you do. OTOH I'm not sure if you should care. If one's
invalid, the whole thing is invalid.

OK, but how do you know whether or not this is valid?

3:ab23:cat


David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
http://www.manning.com/black => RUBY FOR RAILS, the Ruby book for
Rails developers
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
(e-mail address removed) => me
 
A

ara.t.howard

Hi --

My problem is that I have a string like this: "3:foo6:monkey5:sheep", which
I need to separate into ["foo", "monkey", "sheep"]. The values can contain
numeric values, so splitting at \d won't work. This is what makes it
difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read the
next number, etc.

How do you know, in the case of:

2:ab23:cat

which of the two is invalid?

the second. it's invalid here

2:ab23:cat
^
^

virtual position 10.


harp:~ > cat a.rb
require 'strscan'

class List < ::Array
def initialize s
parse s
end
class ParseError < ::StandardError; end
def parse s
clear
ss = StringScanner.new s.to_s
until ss.eos?
n = ss.scan %r/^\d+:/
raise ParseError, "syntax error beginning @ pos <#{ ss.pos }>" unless n

n = n.to_i
raise RangeError, "bad length <#{ n }>" if n < 0

word = ss.scan(%r/.{0,#{ n }}/).to_s

raise ParseError, "syntax error beginning @ pos <#{ ss.pos }>" unless
word.size == n

self << word
end
end
end

list = List.new "3:ab23:cat5:sheep"
p list

list = List.new "2:ab23:cat"
p list

harp:~ > ruby a.rb
["ab2", "cat", "sheep"]
a.rb:20:in `parse': syntax error beginning @ pos <10> (List::parseError)
from a.rb:5:in `initialize'
from a.rb:31


regards.

-a
 
A

ara.t.howard

OK, but how do you know whether or not this is valid?

3:ab23:cat

it is valid.


harp:~ > cat a.rb
require 'strscan'

class List < ::Array
def initialize s
parse s
end
class ParseError < ::StandardError; end
def parse s
clear
ss = StringScanner.new s.to_s
until ss.eos?
n = ss.scan %r/^\d+:/
raise ParseError, "syntax error beginning @ pos <#{ ss.pos }>" unless n
n = n.to_i
raise RangeError, "bad length <#{ n }>" if n < 0
word = ss.scan(%r/.{0,#{ n }}/).to_s
raise ParseError, "syntax error beginning @ pos <#{ ss.pos }>" unless
word.size == n
self << word
end
end
end

list = List.new "3:ab23:cat5:sheep"
p list

list = List.new "3:ab23:cat"
p list

harp:~ > ruby a.rb
["ab2", "cat", "sheep"]
["ab2", "cat"]


regards.

-a
 
P

Phrogz

Here's what I consider a slightly cleaner, more robust version:

require 'strscan'
inp = "3:ab23:cat5:sheep"
s = StringScanner.new( inp )
words = []
until s.eos?
begin
unless digits = s.scan( /\d+(?=:)/ )
raise "I can't find an integer followed by a colon"
end
s.pos += 1 # skip the colon we know is there
digits = digits.to_i
unless s.rest_size >= digits
raise "I ran out of characters; looking for #{digits} characters,
#{s.rest_size} left"
end
words << s.peek( digits )
s.pos += digits
rescue RuntimeError => e
warn "Looking at #{s.rest.inspect},"
warn e.message
abort "Words found so far: #{words.inspect}"
end
end
puts "Words found: ", words
 
D

dblack

Hi --

Hi --

My problem is that I have a string like this: "3:foo6:monkey5:sheep",
which I need to separate into ["foo", "monkey", "sheep"]. The values can
contain numeric values, so splitting at \d won't work. This is what makes
it difficult:

"3:ab23:cat5:sheep" => ["ab2", "cat", "sheep"]

I need to grab the number, then read that many characters, then read the
next number, etc.

How do you know, in the case of:

2:ab23:cat

which of the two is invalid?

the second. it's invalid here

2:ab23:cat
^
^

virtual position 10.

How do you know? The first one could be wrong and the second one
right.


David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
http://www.manning.com/black => RUBY FOR RAILS, the Ruby book for
Rails developers
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
(e-mail address removed) => me
 
A

ara.t.howard

How do you know? The first one could be wrong and the second one
right.

because that's the specification:

so, in this case, the parse follows these steps:

- read '2:'
- read 2 chars
- read '23:'
- read 23 chars -> ERROR

i think you are confusing the syntax of the structure for the the logic of it.
syntax wise '2:ab' is totally valid and so that token pair is consumed. of
course it's the case that the user meant

3:ab23:cat

it's not up to parsers to detect logic errors, only syntax errors. in this
case the left to right quality (no backtracking specified) makes it
unambiguous as to which token contains the error - it's the second.

yaml has similar issues, for instance this yaml document

k : v
- 0
- 2
- 3

loads as

{"k"=>[0, 2, 3]}

which is obviously not what is meant. but is it

- k : v
- 0
- 2
- 3

or

k :
- 0
- 2
- 3

both are one char errors. nevertheless it's too much to expect yaml to
determine which was meant or suspect that an error was made.

regards.

-a
 
A

ara.t.howard

Hi --



I guess what I mean is: how do you know whether it's valid by
coincidence but actually wrong?

the same way ruby knows that this

a = false and must_call_this_function_but_do_not!

is valid but actually wrong -- it doesn't! that's why logic errors are the
worst kind to debug...

fortunately for us too - otherwise we programmers would be out of work!

;-)

-a
 
D

dblack

Hi --

the same way ruby knows that this

a = false and must_call_this_function_but_do_not!

is valid but actually wrong -- it doesn't! that's why logic errors are the
worst kind to debug...

fortunately for us too - otherwise we programmers would be out of work!

I think what I'm groping toward (or not) is a case where backtracking
is necessary before the conclusion is drawn that it's invalid. I know
backtracking wasn't specified... but I wonder whether there are cases
that can't be resolved otherwise. (Just thinking out loud, so to
speak.)


David

--
http://www.rubypowerandlight.com => Ruby/Rails training & consultancy
http://www.manning.com/black => RUBY FOR RAILS, the Ruby book for
Rails developers
http://dablog.rubypal.com => D[avid ]A[. ]B[lack's][ Web]log
(e-mail address removed) => me
 
D

Daniel Schierbeck

Phrogz said:
Here's what I consider a slightly cleaner, more robust version:

require 'strscan'
inp = "3:ab23:cat5:sheep"
s = StringScanner.new( inp )
words = []
until s.eos?
begin
unless digits = s.scan( /\d+(?=:)/ )
raise "I can't find an integer followed by a colon"
end
s.pos += 1 # skip the colon we know is there
digits = digits.to_i
unless s.rest_size >= digits
raise "I ran out of characters; looking for #{digits} characters,
#{s.rest_size} left"
end
words << s.peek( digits )
s.pos += digits
rescue RuntimeError => e
warn "Looking at #{s.rest.inspect},"
warn e.message
abort "Words found so far: #{words.inspect}"
end
end
puts "Words found: ", words

Thanks for the help!

I ended up doing it slightly different (after a lot of pain, and a sneek
peek at a Python script.):

This is an early library that handles bencoding. Note that the error
handling is pretty much non-existent at this point, and it'll break if
you fart on it.


class String
def bencode
"#{length}:#{self}"
end
end

class Numeric
def bencode
"i#{floor}e"
end
end

class Array
def bencode
"l#{map{|obj| obj.bencode}}e"
end
end

class Hash
def bencode
"d#{map{|key, val| [key.bencode, val.bencode]}}e"
end
end

module Bencode
@filter = {:integer => /(0|-?[1-9]+\d*)e/,
:string => /(0|[1-9][0-9]*):/}

class << self
attr_reader :map

def dump(obj)
obj.bencode
end

def load(str)
decode(str, 0).first
end

def decode_integer(data, i)
match = @filter[:integer].match(data[i..-1])
raise ArgumentError if match.nil?
return Integer(match[1]), match.end(0) + i
end

def decode_string(data, i)
match = @filter[:string].match(data[i..-1])
raise ArgumentError if match.nil?
length = Integer(match[1])
offset = match.end(0) + length + i
return data[(match.end(0) + i)...offset], offset
end

def decode_list(data, i)
ary = []
while data[i..i] != "e"
val, i = decode(data, i)
ary << val
end
return ary, i + 1
end

def decode_dict(data, i)
hsh = {}
while data[i..i] != "e"
key, i = decode_string(data, i)
val, i = decode(data, i)
hsh[key] = val
end
return hsh, i + 1
end

def decode(data, i)
case data[i..i]
when "i"
decode_integer(data, i + 1)
when "l"
decode_list(data, i + 1)
when "d"
decode_dict(data, i + 1)
else
decode_string(data, i)
end
end
end
end


Eventually I'll upload it to RubyForge. If you've got comments, bring
'em on, but remember that I only just started this today.


Cheers,
Daniel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,818
Latest member
Brigette36

Latest Threads

Top