confused by 'test'.gsub(/.*/,'x')

W

Wybo Dekker

Why do I get "xx" instead of "x" in the following:

$ irb=> "xx"

and even more confusing (to me):
=> "yy\ny"

(I expected "y\n")
 
T

Thomas Wieczorek

Why do I get "xx" instead of "x" in the following:

$ irb
=> "xx"

* matches NO and ALL characters, so gsub() substitutes
''(empty)(=>'x') and and 'test'(=>'x') with x, so you get 'xx'
and even more confusing (to me):

=> "yy\ny"

Same goes here as above. If you want to replace each character use
'test'.gsub(/./,'x') #=> 'xxxx'
or if you want to replace all characters in each line, use
"test\ntest".gsub(/.+/,'x') #=> "x\nx"
 
Y

Yossef Mendelssohn

.* matches NO and ALL characters, so gsub() substitutes
''(empty)(=>'x') and and 'test'(=>'x') with x, so you get 'xx'

That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x' more
than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me that the .*
should match [empty string]test[empty string] just once.

This makes sense because . doesn't normally match \n, so there's the
replacement before and after. Still, the double replacement when there
are actual characters is just weird.
 
T

Thomas Wieczorek

.* matches NO and ALL characters, so gsub() substitutes
''(empty)(=>'x') and and 'test'(=>'x') with x, so you get 'xx'

That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x' more
than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me that the .*
should match [empty string]test[empty string] just once.

Yeah, it is confusing me, but I agreed on that explanation with
myself, when I read it once here. I'd also expect 'x' instead of 'xx'
 
J

Jens Wille

Thomas Wieczorek [2008-04-02 22:59]:
.* matches NO and ALL characters, so gsub() substitutes
''(empty)(=>'x') and and 'test'(=>'x') with x, so you get
'xx'
That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x'
more than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me
that the .* should match [empty string]test[empty string] just once.
Yeah, it is confusing me, but I agreed on that explanation with
myself, when I read it once here. I'd also expect 'x' instead of 'xx'
can't explain it either, i'm afraid. but you can see what it does
like so:

irb> 'test'.gsub(/.*/) { |m| p m; 'x'}
"test"
""
=>"xx"

as soon as you anchor the regexp at the beginning of the string it
gives the expected result:

irb> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
"test"
=>"x"

or just do:

irb> 'test'.sub(/.*/) { |m| p m; 'x'}
"test"
=>"x"

;-)

cheers
jens
 
B

Brian Adkins

Thomas Wieczorek [2008-04-02 22:59]:> On Wed, Apr 2, 2008 at 10:55 PM, Yossef Mendelssohn
On Apr 2, 3:35 pm, "Thomas Wieczorek"
.* matches NO and ALL characters, so gsub() substitutes
''(empty)(=>'x') and and 'test'(=>'x') with x, so you get
'xx'
That sounds like an explanation why ''.gsub(/.*/, 'x') is 'x'
more than why 'test'.gsub(/.*/, 'x') is 'xx'. It seems to me
that the .* should match [empty string]test[empty string] just once.
Yeah, it is confusing me, but I agreed on that explanation with
myself, when I read it once here. I'd also expect 'x' instead of 'xx'

can't explain it either, i'm afraid. but you can see what it does
like so:

irb> 'test'.gsub(/.*/) { |m| p m; 'x'}
"test"
""
=>"xx"

That seems like a bug to me. The entire string is matched/consumed
by .*, so why try matching again? Or, if you are going to continue,
why stop with just one additional match? Is there code in gsub to
"only match one time after the string is consumed" ?

irb(main):001:0> 'test' =~ /(.*)(.*)(.*)/
=> 0
irb(main):002:0> $1
=> "test"
irb(main):003:0> $2
=> ""
irb(main):004:0> $3
=> ""
 
W

Wybo Dekker

Jens said:
as soon as you anchor the regexp at the beginning of the string it
gives the expected result:

irb> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
"test"
=>"x"

or just do:

irb> 'test'.sub(/.*/) { |m| p m; 'x'}
"test"
=>"x"

sure, that works, and so does test.gsub(/.+/,'x').
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, i.e.: all of the string should be replaced with a single x,
as far as I can see.
 
J

Januski, Ken

Seems wrong to me as well. If you do a destructive gsub and test for
individual letters, e.g. /t.*/,'x', you get 'tex' as you'd expect. Seems
wrong to get the double 'x', when you use your example. Of course my
background is Perl and I believe that's how it would work there.

irb(main):016:0> 'test'.gsub!(/.*/, 'x')
=3D> "xx"
irb(main):017:0> 'test'.gsub!(/e.*/, 'x')
=3D> "tx"
irb(main):018:0> 'test'.gsub!(/s.*/, 'x')
=3D> "tex"
irb(main):019:0> 'test'.gsub!(/t.*/, 'x')
=3D> "x"
irb(main):020:0> 'test'.gsub!(/st.*/, 'x')
=3D> "tex"

Ken


-----Original Message-----
From: Wybo Dekker [mailto:[email protected]]=20
Sent: Wednesday, April 02, 2008 5:39 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')

Jens said:
as soon as you anchor the regexp at the beginning of the string it=20
gives the expected result:
=20
irb> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
"test"
=3D>"x"
=20
or just do:
=20
irb> 'test'.sub(/.*/) { |m| p m; 'x'}
"test"
=3D>"x"

sure, that works, and so does test.gsub(/.+/,'x').
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, i.e.: all of the string should be replaced with a single x,
as far as I can see.
 
J

Jens Wille

Januski, Ken [2008-04-03 00:08]:
Of course my background is Perl and I believe that's how it would
work there.
no, works the same way there:

sh> perl -e '$s = "test"; $s =~ s/.*/x/g; print "$s\n"'
xx

(only a lot more complicated ;-)

btw: python, php and javascript, too.

oh, and here's what oniguruma does:

irb> Oniguruma::ORegexp.new('.*').gsub('test', 'x')
=>"xx"

cheers
jens
 
B

Bilyk, Alex

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, ...
-
Wybo
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I would venture to say this is exactly what it does. It finds two
matches and replaces them both with 'x'. The first match is an empty
string <zero>, while the second match is the full string <or more >.

Alex


-----Original Message-----
From: Januski, Ken [mailto:[email protected]]=20
Sent: Wednesday, April 02, 2008 3:08 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')

Seems wrong to me as well. If you do a destructive gsub and test for
individual letters, e.g. /t.*/,'x', you get 'tex' as you'd expect. Seems
wrong to get the double 'x', when you use your example. Of course my
background is Perl and I believe that's how it would work there.

irb(main):016:0> 'test'.gsub!(/.*/, 'x')
=3D> "xx"
irb(main):017:0> 'test'.gsub!(/e.*/, 'x')
=3D> "tx"
irb(main):018:0> 'test'.gsub!(/s.*/, 'x')
=3D> "tex"
irb(main):019:0> 'test'.gsub!(/t.*/, 'x')
=3D> "x"
irb(main):020:0> 'test'.gsub!(/st.*/, 'x')
=3D> "tex"

Ken


-----Original Message-----
From: Wybo Dekker [mailto:[email protected]]=20
Sent: Wednesday, April 02, 2008 5:39 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')

Jens said:
as soon as you anchor the regexp at the beginning of the string it=20
gives the expected result:
=20
irb> 'test'.gsub(/\A.*/) { |m| p m; 'x'}
"test"
=3D>"x"
=20
or just do:
=20
irb> 'test'.sub(/.*/) { |m| p m; 'x'}
"test"
=3D>"x"

sure, that works, and so does test.gsub(/.+/,'x').
The point is that I don't understand why test.gsub(/.*/,'x') gives me
'xx', since .* means: zero or more of any character, except the newline
character, i.e.: all of the string should be replaced with a single x,
as far as I can see.
 
Z

Zoltan Dezso

Perl, PHP:

perl -le '$str="test"; $str =~ s/.*?/x/g; print $str;'
xxxxxxxxx

preg_replace('/.*?/', 'x', 'test');
xxxxxxxxx

Ruby:
print 'test'.gsub(/.*?/, 'x')
xtxexsxtx

Zaki
 
J

Januski, Ken

Right you are. For all the years I've used Perl, and for all that I =
thought I knew about regexes, I never would have thought I would get =
that result.

I would have expected one greedy match for the entire text. Instead I =
guess it's first getting the zero match and then the full match.



-----Original Message-----
From: Jens Wille [mailto:[email protected]]
Sent: Wed 4/2/2008 7:03 PM
To: ruby-talk ML
Subject: Re: confused by 'test'.gsub(/.*/,'x')
=20
Januski, Ken [2008-04-03 00:08]:
Of course my background is Perl and I believe that's how it would
work there.
no, works the same way there:

sh> perl -e '$s =3D "test"; $s =3D~ s/.*/x/g; print "$s\n"'
xx

(only a lot more complicated ;-)

btw: python, php and javascript, too.

oh, and here's what oniguruma does:

irb> Oniguruma::ORegexp.new('.*').gsub('test', 'x')
=3D>"xx"

cheers
jens
 
P

Peña, Botp

RnJvbTogV3libyBEZWtrZXIgW21haWx0bzp3eWJvQHNlcnZhbHlzLm5sXSANCiMgc3VyZSwgdGhh
dCB3b3JrcywgYW5kIHNvIGRvZXMgdGVzdC5nc3ViKC8uKy8sJ3gnKS4NCiMgVGhlIHBvaW50IGlz
IHRoYXQgSSBkb24ndCB1bmRlcnN0YW5kIHdoeSB0ZXN0LmdzdWIoLy4qLywneCcpIGdpdmVzIG1l
IA0KIyAneHgnLCBzaW5jZSAuKiBtZWFuczogemVybyBvciBtb3JlIG9mIGFueSBjaGFyYWN0ZXIs
IGV4Y2VwdCANCiMgdGhlIG5ld2xpbmUgDQojIGNoYXJhY3RlciwgaS5lLjogYWxsIG9mIHRoZSBz
dHJpbmcgc2hvdWxkIGJlIHJlcGxhY2VkIHdpdGggYSANCiMgc2luZ2xlIHgsIGFzIGZhciBhcyBJ
IGNhbiBzZWUuDQoNCnlvdSBjYW4gc3RhcnQgKHNsb3dseSkgYnkgY29tcGFyaW5nIHRoZXNlIHR3
byBleGFtcGxlcywNCg0KaXJiKG1haW4pOjA3NzowPiAnJy5nc3ViKC8uKi8sICd4JykNCj0+ICJ4
Ig0KDQppcmIobWFpbik6MDc4OjA+ICcnLmdzdWIoLy4rLywgJ3gnKQ0KPT4gIiINCg0Ka2luZCBy
ZWdhcmRzIC1ib3RwDQo=
 
D

Daniel Sheppard

=20
I would have expected one greedy match for the entire text.=20
Instead I guess it's first getting the zero match and then=20
the full match.

Actually, it's vice versa. It matches the whole string (greedy), then
matches the end of string. The "test" string is seen by the regex engine
as:

test<end of string>

* first matches "test". <end of string> is a special 'character' that
is not consumed by ".", so the remaining string is then "<end of
string>", This is also matched, as it contains zero or more characters
(but is not then matched infinitely, as the position in the string has
not advanced.

Dan.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,289
Messages
2,571,439
Members
48,123
Latest member
LuisRios7

Latest Threads

Top