Explain this ruby regex

renton.dan · Oct 3, 2008

Can someone explain this regex ...

"one two".scan(/\w*/).length

returns 4. I can see it matching the 2 words and the space, what else
is it matching on? Is there a null terminator, I thought Ruby strings
were not null termed.

Ben Bleything · Oct 3, 2008

"one two".scan(/\w*/).length

returns 4. I can see it matching the 2 words and the space, what else
is it matching on? Is there a null terminator, I thought Ruby strings
were not null termed.

Try replacing #length with #inspect and seeing what the output of scan
is. You'll find that it's returning two empty strings as well. I
suspect what you really want is \w+...

Ben

renton.dan · Oct 3, 2008

Try replacing #length with #inspect and seeing what the output of scan
is. You'll find that it's returning two empty strings as well. I
suspect what you really want is \w+...

Ben

Yeah, you're right \w+ will pull out the words, which is what I want
anyway. Though I'm trying to understand what \w* is doing.
irb(main):015:0> "one two".scan(/\w*/).inspect
=> "[\"one\", \"\", \"two\", \"\"]"

My question is, what is the last "\", where does it come from.

Patrick He · Oct 3, 2008

\w* does not match the space between string "one" and "two". it matches
"one", <empty string after "one">, "two", <empty string after "two">.

There are some other examples:

irb(main):004:0> "one".scan(/^\w*/)
=> ["one"]
irb(main):005:0> "one".scan(/\w*$/)
=> ["one", ""]

--
Patrick

Try replacing #length with #inspect and seeing what the output of scan
is. You'll find that it's returning two empty strings as well. I
suspect what you really want is \w+...

Ben

Click to expand...

Yeah, you're right \w+ will pull out the words, which is what I want
anyway. Though I'm trying to understand what \w* is doing.
irb(main):015:0> "one two".scan(/\w*/).inspect
=> "[\"one\", \"\", \"two\", \"\"]"

My question is, what is the last "\", where does it come from.

Patrick Doyle · Oct 3, 2008

[Note: parts of this message were removed to make it a legal post.]

The key idea here is that "*" means "match zero or more of" whereas "+"
means "match one or more of". So, when you match \w* against "one two",
there are zero or more instances of a word character (3, in fact, 'o', 'n',
and 'e'), so that produces one result. Following that result, there are
zero matches of a word character, but since you asked for "zero or more of",
you get that empty string result. Later, rinse, repeat for the "two" part.

FWIW, instead of looking at the result with #inspect, I found it more
informative to look at the result returned from #scan by itself, e.g.

irb> "one two".scan(/\w*/)
=> ["one", "", "two", ""]

--wpd

\w* does not match the space between string "one" and "two". it matches
"one", <empty string after "one">, "two", <empty string after "two">.

There are some other examples:

irb(main):004:0> "one".scan(/^\w*/)
=> ["one"]
irb(main):005:0> "one".scan(/\w*$/)
=> ["one", ""]

--
Patrick

On Sat, Oct 04, 2008, (e-mail address removed) wrote:

"one two".scan(/\w*/).length

returns 4. I can see it matching the 2 words and the space, what else
is it matching on? Is there a null terminator, I thought Ruby strings
were not null termed.

Try replacing #length with #inspect and seeing what the output of scan
is. You'll find that it's returning two empty strings as well. I
suspect what you really want is \w+...

Ben

Click to expand...

Yeah, you're right \w+ will pull out the words, which is what I want
anyway. Though I'm trying to understand what \w* is doing.
irb(main):015:0> "one two".scan(/\w*/).inspect
=> "[\"one\", \"\", \"two\", \"\"]"

My question is, what is the last "\", where does it come from.

Click to expand...

Brian Candler · Oct 3, 2008

FWIW, instead of looking at the result with #inspect, I found it more

informative to look at the result returned from #scan by itself, e.g.

irb> "one two".scan(/\w*/)
=> ["one", "", "two", ""]

irb displays the expression value using "inspect", so you are using
inspect even though you didn't ask for it

Robert Klemme · Oct 5, 2008

The key idea here is that "*" means "match zero or more of" whereas "+"
means "match one or more of". So, when you match \w* against "one two",
there are zero or more instances of a word character (3, in fact, 'o', 'n',
and 'e'), so that produces one result. Following that result, there are
zero matches of a word character, but since you asked for "zero or more of",
you get that empty string result. Later, rinse, repeat for the "two" part.

It boils down to this statement: a subexpression with "*" potentially
matches an _empty string anywhere_ in a string.

Kind regards

robert

Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
My regex kung-fu is not strong =(	0	Apr 4, 2020
Can't solve problems! please Help	0	Sep 26, 2022
Please explain this "Why's" example please	8	Jun 6, 2010
Decoding no of ways and printing each decode message	2	Jun 1, 2021
Tasks	1	Nov 29, 2022
Finding all regex matches by index?	1	May 30, 2012
Help with passing test	3	Jun 8, 2023

Explain this ruby regex

renton.dan

Ben Bleything

renton.dan

Patrick He

Patrick Doyle

Brian Candler

Robert Klemme

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads