sub/gsub bug?

M

Matt

Is this a bug?

irb(main):032:0> "matt's test".sub(/'/, "\\'")
=> "matts tests test"

It's trying to replace a single quote with an escaped quote.

Thanks in advance,
Matt
 
M

Matt

Phlip said:
Untill someone dissects the source and finds (or adds) a better fix, here's
a workaround:

Thanks for the reply, Phlip. As it turns out, when I was researching
this problem I found out that you can use replaceable parameters in
DBI queries instead of worrying about escaping quotes, so that's a
relief. I was just kind of curious why gsub() wasn't working the way
I thought it would.

irb(main):012:0> "matt's test".sub(/'/){"\\'"}
=> "matt\\'s test"

That surprised me too, because I thought that sub's block form also
interpolated \n group inserters, but I suppose that blocks have room for
true $1 match references. \1 is often available in some regular expressions
that cram all the searchers and replacers into one huge line, so a $1 is
naturally not available inside its own source expression. (I also don't
happen to know if Ruby supports such super-regices!)

Yeah that's strange that they would write two replacement string
parsers, but I guess there's probably a good reason for it.

Super-regices? You mean like s/xxx/yyy/g? I haven't seen that
anywhere in my Ruby travels, but I'm kind of new.

Matt
 
J

J. Cooper

Matt said:
Is this a bug?

irb(main):032:0> "matt's test".sub(/'/, "\\'")
=> "matts tests test"

It's trying to replace a single quote with an escaped quote.

Thanks in advance,
Matt

irb(main):002:0> puts "matt's test".sub(/'/, "\\\\'")
matt\'s test
=> nil

The gsub replacement string is a crazy thing when it comes to escaping,
because remember you can put backreferences in it. Any double quoted
string needs 2 backslashes to represent a literal backslash, but
remember in this context, a literal backslash represents the
backreference syntax, so you need to escape it with another "literal
backslash" of 2 backslashes.

At least, I think that's how I understand it :p
 
M

Matt

J. Cooper said:
The gsub replacement string is a crazy thing when it comes to escaping,
because remember you can put backreferences in it. Any double quoted
string needs 2 backslashes to represent a literal backslash, but
remember in this context, a literal backslash represents the
backreference syntax, so you need to escape it with another "literal
backslash" of 2 backslashes.

At least, I think that's how I understand it :p

Yeah, I get all that, but I didn't think that \' was a backreference.
I thought it was limited to numbers and the ampersand. But now that I
think about it, Perl has a $' variable that contains the portion of
the string after the match, so I guess that's where the idea came
from. I've never seen that in backreference form before.

irb(main):005:0> "matt's test".gsub(/'/, "\\`")
=> "mattmatts test"
irb(main):006:0> "matt's test".gsub(/'/, "\\&")
=> "matt's test"
irb(main):007:0> "matt's test".gsub(/'/, "\\'")
=> "matts tests test"

Yay!
Matt
 
R

Robert Klemme

To understand the bug, try this:

There is no bug.
irb(main):009:0> "matt's test".sub(/(')/, '\1')
=> "matt's test"

The 'single ticks' interpret \1 as a literal \ and a 1, not an escape. Then
.sub sees the \1, and replaces it with the 1nth (grouping) from the Regexp -
which is a literal tick '.

Now try it with "double quotes":

irb(main):005:0> "matt's test".sub(/(')/, "\\1")
=> "matt's test"

The " forces us to escape the \ as \\ to get a literal \.

And a literal \ in a replacement string is a meta character. To get a
literal in a replacement, you need \\ in the string and consequently
\\\\ in a double quoted string.
So your \\' indeed goes in as a literal \'. But the group replacer still
sees a \, and snarfs it, expecting a number after it. I would call that a
bug (essentially because I might personally be capable of correctly parsing
a \1!)

No, this is not a bug and it is not the "group replacer". The
replacement string is parsed by the regexp engine and since the
backslash does not escape a character that has meta capabilities (such
as & and 1..9) it is silently discarded.
Untill someone dissects the source and finds (or adds) a better fix, here's
a workaround:

irb(main):012:0> "matt's test".sub(/'/){"\\'"}
=> "matt\\'s test"

No, the proper way to do it is this:

irb(main):002:0> "matt's test".sub(/'/, "\\\\'")
=> "matt\\'s test"
irb(main):003:0> puts "matt's test".sub(/'/, "\\\\'")
matt\'s test
=> nil

irb(main):006:0> "matt's test".sub(/'/, '\\\\\'')
=> "matt\\'s test"
irb(main):007:0> puts "matt's test".sub(/'/, '\\\\\'')
matt\'s test
=> nil

Blocks are only needed if you need to do some calculations based on the
match, e.g.

irb(main):008:0> "There is 1 number in this string".gsub(/\d+/) {|m|
m.to_i * 34}
=> "There is 34 number in this string"
That surprised me too, because I thought that sub's block form also
interpolated \n group inserters, but I suppose that blocks have room for
true $1 match references.
Yes.

\1 is often available in some regular expressions
that cram all the searchers and replacers into one huge line, so a $1 is
naturally not available inside its own source expression. (I also don't
happen to know if Ruby supports such super-regices!)

I am not sure what you mean here. The reason $1 cannot be used in
regular expressions and replacement strings is that it is a variable
that gets its value before the regexp is created (of course you can use
it but it cannot refer to matching of the regexp you put it into).

You can use groups inside the regular expression as well as in the
replacement string if this is what you mean.

irb(main):017:0> "aba abc".gsub(/(.)(b)\1/, "[\\1]<\\2>[\\1]")
=> "[a]<b>[a] abc"

Kind regards

robert
 
R

Robert Klemme

There is no bug.


And a literal \ in a replacement string is a meta character. To get a
literal in a replacement, you need \\ in the string and consequently
\\\\ in a double quoted string.


No, this is not a bug and it is not the "group replacer". The
replacement string is parsed by the regexp engine and since the
backslash does not escape a character that has meta capabilities (such
as & and 1..9) it is silently discarded.
^^^^^^^^^^^^^^^^^^

This was wrong of course as Matt demonstrated

irb(main):019:0> "abc".gsub(/b/, "<\\'>")
=> "a<c>c"

But the backslash is just retained if it does not appear with a meta
capable character:

irb(main):020:0> "abc".gsub(/b/, "<\\a>")
=> "a<\\a>c"
irb(main):021:0> puts "abc".gsub(/b/, "<\\a>")
a<\a>c
=> nil

Sorry, for the added confusion.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

gsub oddity? 5
Using grep on subarrays - help! 11
gsub pattern substitution and ${...} 7
gsub ? 2
regex to escape special characters 4
regex problem with gsub 3
gsub help 7
gsub for string 3

Members online

No members online now.

Forum statistics

Threads
474,142
Messages
2,570,819
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top