escaping single quotes in a string with gsub

P

Paul Rubel

Hi,

I'm trying to take a string and escape a single quote if it is not
already escaped. My first thought was to look at the string and if I
see a quote without a backslash before it put the backslash there.

This has a problem when there is an escaped slash before the quote:
\\'. I believe the fix should be to look two characters back. If
anyone has a canned solution I'm all ears. Would look-behind be an
option here out of the box?

While I was experimenting I saw some behavior I don't understand and
am hoping someone can explain it to me:

prubel@cornet /tmp> cat /tmp/t.rb ; ruby /tmp/t.rb
2.times do
# replace not a slash followed by a quote with not a slash
# and an escaped quote.
puts("\\'Summer's Day".gsub(/([^\\\\])'/,"*#{$1}\\\\'*"))
puts $1
end
#end
\'Summe*\'*s Day
r
\'Summe*r\'*s Day
r
prubel /tmp> ruby --version
ruby 1.8.1 (2004-02-06) [i686-linux-gnu]


I'm confused at to why the output is different for the two
iterations? Why doesn't the r get placed in the first output?


thank you for your help,
Paul
 
J

James Edward Gray II

Hi,

I'm trying to take a string and escape a single quote if it is not
already escaped. My first thought was to look at the string and if I
see a quote without a backslash before it put the backslash there.

What about:

gsub(/(\\*)'/) { |m| $1.length % 2 == 0 ? $1 + "\\'" : m }
Would look-behind be an option here out of the box?

Surprisingly, I don't believe Ruby yet supports lookbehind.
While I was experimenting I saw some behavior I don't understand and
am hoping someone can explain it to me:

prubel@cornet /tmp> cat /tmp/t.rb ; ruby /tmp/t.rb
2.times do
# replace not a slash followed by a quote with not a slash
# and an escaped quote.
puts("\\'Summer's Day".gsub(/([^\\\\])'/,"*#{$1}\\\\'*"))

The above line is problematic for two reasons. First, when using the
replacement string version of gsub(), your string is interpolated
before the method is even called let alone before any matches are made
so $1 and friends are not set. Instead, try using a \1 in a single
quoted string or \\1 in a double to get the value you're after.

Two, I don't understand your pattern. [^\\\\] means ONE character that
is not a slash and also not a slash. It's identical to [^\\]. I think
you meant to say, not two slashes, but that's a little harder to
express in a regex. And what if there are three slashes? See my
solution above for a different approach.
puts $1
end
#end
\'Summe*\'*s Day
r
\'Summe*r\'*s Day
r
prubel /tmp> ruby --version
ruby 1.8.1 (2004-02-06) [i686-linux-gnu]


I'm confused at to why the output is different for the two
iterations? Why doesn't the r get placed in the first output?

Because $1 isn't set in time for the first replacement, but it is set
when the second string is built (set by the first match).

Hope that helps.

James Edward Gray II
 
D

David A. Black

Hi --

Hi,

I'm trying to take a string and escape a single quote if it is not
already escaped. My first thought was to look at the string and if I
see a quote without a backslash before it put the backslash there.

This has a problem when there is an escaped slash before the quote:
\\'. I believe the fix should be to look two characters back. If
anyone has a canned solution I'm all ears. Would look-behind be an
option here out of the box?

I admit I get confused by escaping and stuff... but I can't quite
picture the case you're describing. If a string contains a single
quote:

"abc'def"

that's the same as:

"abc\'def"

So I don't think you'll actually see that backslash before the single
quote when you scan the string.

If you do see a slash -- i.e., if the string is:

abc\'def

then that would probably be generated with "abc\\'def", which would be
equivalent to:

"abc\\\'def"

I'm afraid I didn't quite follow the Summer's Day example. Can you
give another?


David
 
P

Paul Rubel

David,
I admit I get confused by escaping and stuff... but I can't quite
picture the case you're describing. If a string contains a single
quote:

"abc'def"

that's the same as:

"abc\'def"

So I don't think you'll actually see that backslash before the single
quote when you scan the string.

Insightful. I suspect you're right and that I made things needlessly
complicated (at least at this point). When my code sees the string the
escaping should already have occurred.
If you do see a slash -- i.e., if the string is:

abc\'def

then that would probably be generated with "abc\\'def", which would be
equivalent to:

"abc\\\'def"

I'm afraid I didn't quite follow the Summer's Day example. Can you
give another?


The context that I saw the problem was the following:

The code takes in a name and a value and then evals them. If the
var_value has an unescaped single quote it would give an error that
the string was malformed.

to_eval = "#{var_name} = '#{var_value}'";
eval(to_eval, binding)

Looking at it now I expect that a backslash in the var_value will
cause problems most of the time as the strings contents get
interpolated a second time during the eval. Is there a better way to
set a value in a binding? The implementation has the option to set
values in a hash rather than in the binding but I'd like to keep both
if possible.

thank you,
Paul
 
P

Paul Rubel

James said:
What about:

gsub(/(\\*)'/) { |m| $1.length % 2 == 0 ? $1 + "\\'" : m }

That does look much better.
While I was experimenting I saw some behavior I don't understand and
am hoping someone can explain it to me:

prubel@cornet /tmp> cat /tmp/t.rb ; ruby /tmp/t.rb
2.times do
# replace not a slash followed by a quote with not a slash
# and an escaped quote.
puts("\\'Summer's Day".gsub(/([^\\\\])'/,"*#{$1}\\\\'*"))

The above line is problematic for two reasons. First, when using the
replacement string version of gsub(), your string is interpolated
before the method is even called let alone before any matches are made
so $1 and friends are not set. Instead, try using a \1 in a single
quoted string or \\1 in a double to get the value you're after.

I should have know. Thanks for the explanation.
Two, I don't understand your pattern. [^\\\\] means ONE character that
is not a slash and also not a slash. It's identical to [^\\]. I think
you meant to say, not two slashes, but that's a little harder to
express in a regex. And what if there are three slashes? See my
solution above for a different approach.

I mean to say not a slash but the interpolation in the replacement
confused me. After reading your response and thinking a bit I believe
my head has been wrapped around the issue.

Hope that helps.

Very much.
thank you,
Paul
 
F

Florian Gross

James said:
Surprisingly, I don't believe Ruby yet supports lookbehind.

However it does support look-ahead which is enough in this case:

"foo bar don't \\'".gsub(/((?!\\).(?:\\{2})*)'/, "\\1\\\\'")
# result: foo bar don\'t \'


And since the escape string is only a single character:

"foo bar don't \\'".gsub(/([^\\](?:\\{2})*)'/, "\\1\\\\'")
# result: foo bar don\'t \'


(Note that this is basically your Regexp, but with some of the filtering
logic moved from the block to the Regexp itself. The replacement string
looks disgusting. I think your solution is way clearer.)

Here is a sample with a multiple-width escape string:

"foo bar don't ESC'".gsub(/((?!ESC).{3}(?:(?:ESC){2})*)'/, "\\1ESC'")
# result: foo bar donESC't ESC'
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,701
Latest member
XavierQ83

Latest Threads

Top