Daniel Schierbeck said:
I'm trying to write a regular expression that matches bencoded
strings, i.e. strings on the form x:y, where x is the numeric length
of y.
This is valid:
6:foobar
while this is not:
4:foo
I don't think that what you want to do is possible with a mere regular
expression.
It might be possible using perl's special
evaluate-code-while-in-regexp (??{ code }) feature, but not with any
language that doesn't allow regular expression evaluations to escape
back into the host language.
The problem is that you want to leave crucial portions of the regexp
uncompiled until the moment that half of the regular expression has
matched, and this is not possible.
But matching bencoded data isn't that hard; here's something I just
whipped up that should handle bencoded data:
require 'strscan'
class BencodeScanner
def BencodeScanner.scan(str)
scan = StringScanner.new(str)
r = BencodeScanner.doscan_internal(scan,false)
raise "Malformed Bencoded String" unless scan.eos?
r
end
private
@@string_regexps = Hash.new {|h,k| h[k] = /:.{#{k}}/m}
def BencodeScanner.doscan_internal(scanner, allow_e=true)
tok = scanner.scan(/\d+|[ilde]/)
case tok
when nil
raise "Malformed Bencoded String"
when 'e'
raise "Malformed Bencoded String" unless allow_e
return nil
when 'l'
retval = []
while arritem = BencodeScanner.doscan_internal(scanner)
retval << arritem
end
return retval
when 'd'
retval = {}
while key = BencodeScanner.doscan_internal(scanner)
val = BencodeScanner.doscan_internal(scanner,false)
retval[key] = val
end
return retval
when 'i'
raise "Malformed Bencoded String" unless scanner.scan(/-?\d+e/)
return scanner.matched[0,scanner.matched.length-1].to_i
else
raise "Malformed Bencoded String" unless scanner.scan(@@string_regexps[tok])
return scanner.matched[1,tok.to_i]
end
end
end