Regex - Exclude Multiple Characters and Global Scanning

B

Ben Woodcroft

Hihi,

I have 2 problems.

--------------Question 1-----------------------
Firstly, a Ruby question. I'm confused about how to match a single
regular expression multiple times in a single string. For instance,

'llgllallo'.match(/(ll.)/)[0] #-> 'llg'
'llgllallo'.match(/(ll.)/)[1] #-> 'llg'
'llgllallo'.match(/(ll.)/)[1] #-> nil

How do I access all 3 matches? String#scan will work, but that gives me

'llgllallo'.scan(/(ll.)/) #=> [["llg"], ["lla"], ["llo"]]

But I need the offsets, and this info isn't given to me.



--------------Question 2-----------------------
Now an old gap in my regex understanding. How do I exclude on
consecutive characters? I want something like [^abc], except aba or bbc
is ok, just not 'abc'. Summing this up:

reg = /something/
'abc'.match(reg) #-> no match
'cba'.match(reg) #-> match

And then I want to be able to do OR operations too, like not 'abc' and
not 'bbc', but that is probably another step of complexity.

I don't suppose there is any way to pass a block to the regex to use in
a specific place? That would be cool, though maybe not possible given
optimisations in regex?



Thanks in advance,
ben
 
D

David A. Black

Hi --

Hihi,

I have 2 problems.

--------------Question 1-----------------------
Firstly, a Ruby question. I'm confused about how to match a single
regular expression multiple times in a single string. For instance,

'llgllallo'.match(/(ll.)/)[0] #-> 'llg'
'llgllallo'.match(/(ll.)/)[1] #-> 'llg'
'llgllallo'.match(/(ll.)/)[1] #-> nil

How do I access all 3 matches? String#scan will work, but that gives me

'llgllallo'.scan(/(ll.)/) #=> [["llg"], ["lla"], ["llo"]]

But I need the offsets, and this info isn't given to me.

You could do:

irb(main):029:0> offsets = []
=> []
irb(main):030:0> str.scan(/ll./) { offsets << $~.offset(0)[1] }
=> "llgllallo"
irb(main):031:0> offsets
=> [3, 6, 9]

(Pending someone coming up with something slicker. I don't like the
temp variable particularly, but anyway.)
--------------Question 2-----------------------
Now an old gap in my regex understanding. How do I exclude on
consecutive characters? I want something like [^abc], except aba or bbc
is ok, just not 'abc'. Summing this up:

[^abc] means: match one character that is not 'a', not 'b', and not
'c'. I don't think that's what you mean.
reg = /something/
'abc'.match(reg) #-> no match
'cba'.match(reg) #-> match

And then I want to be able to do OR operations too, like not 'abc' and
not 'bbc', but that is probably another step of complexity.

You can use (?!), which is negative lookahead.

irb(main):033:0> reg = /(?!abc)[abc]{3}/
=> /(?!abc)[abc]{3}/

So that means: three of a, b, c, as long as we're not looking at
"abc" when we start looking for those three characters.

irb(main):034:0> reg.match("abc")
=> nil
irb(main):035:0> reg.match("abb")
=> #<MatchData:0x69de8>
irb(main):036:0> reg.match("cba")
=> # said:
I don't suppose there is any way to pass a block to the regex to use in
a specific place? That would be cool, though maybe not possible given
optimisations in regex?

Blocks get passed to methods, not objects, and regexes are objects.
Some of the methods that use regexes also take blocks, like scan, sub,
and gsub. I'm not sure what you mean about the specific place, though.


David
 
B

Ben Woodcroft

David said:
You could do:

irb(main):029:0> offsets = []
=> []
irb(main):030:0> str.scan(/ll./) { offsets << $~.offset(0)[1] }
=> "llgllallo"
irb(main):031:0> offsets
=> [3, 6, 9]

(Pending someone coming up with something slicker. I don't like the
temp variable particularly, but anyway.)

That will work, thanks. It would seem intuitive to me that scan (or a
method like it) would iterate of MatchData objects, but anyway. Thanks.
--------------Question 2-----------------------
Now an old gap in my regex understanding. How do I exclude on
consecutive characters? I want something like [^abc], except aba or bbc
is ok, just not 'abc'. Summing this up:

[^abc] means: match one character that is not 'a', not 'b', and not
'c'. I don't think that's what you mean.
reg = /something/
'abc'.match(reg) #-> no match
'cba'.match(reg) #-> match

And then I want to be able to do OR operations too, like not 'abc' and
not 'bbc', but that is probably another step of complexity.

You can use (?!), which is negative lookahead.

irb(main):033:0> reg = /(?!abc)[abc]{3}/
=> /(?!abc)[abc]{3}/

So that means: three of a, b, c, as long as we're not looking at
"abc" when we start looking for those three characters.

irb(main):034:0> reg.match("abc")
=> nil
irb(main):035:0> reg.match("abb")
=> #<MatchData:0x69de8>
irb(main):036:0> reg.match("cba")
=> #<MatchData:0x63de4>

That is exactly what I meant. I was unaware of the negative lookahead
operator. Thanks!
Blocks get passed to methods, not objects, and regexes are objects.
Some of the methods that use regexes also take blocks, like scan, sub,
and gsub. I'm not sure what you mean about the specific place, though.

My question was not explained very well, sorry. I meant it would be cool
if you could pass a block that became part of the regex itself. For
instance instead of /(?!abc)/ you could somehow tell it
{|s| s != 'abc'}

Just an idea, doesn't really matter now you've fixed my problem.

Thanks,
ben
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,967
Messages
2,570,148
Members
46,694
Latest member
LetaCadwal

Latest Threads

Top