backslash sequences\1\2 in regexs (backreferences)

B

bbxx789_05ss

Is this behavior documented anywhere:

1)
puts "fred:smith".gsub(/(\w+):(\w+)/, '\2, \1')

--output:--
smith, fred

2)
puts "abc".gsub(/a(b)(c)/, "a\2\1")

--output:--
a

The double quotes surrounding the replacement string cause the backslash
sequences to stop working. With single quotes the backslash sequences
work. I can't find anything in pickaxe2 about that. .My understanding
was that double quotes allowed for more substitutions than single
quotes. This appears to be a case where double quotes allow fewer
substitutions than single quotes.
 
M

mike

Is this behavior documented anywhere:

1)
puts "fred:smith".gsub(/(\w+):(\w+)/, '\2, \1')

--output:--
smith, fred

2)
puts "abc".gsub(/a(b)(c)/, "a\2\1")

--output:--
a

The double quotes surrounding the replacement string cause the
backslash
sequences to stop working. With single quotes the backslash sequences
work. I can't find anything in pickaxe2 about that. .My
understanding
was that double quotes allowed for more substitutions than single
quotes. This appears to be a case where double quotes allow fewer
substitutions than single quotes.


The double quotes interpolate the \1 and \2 as characters before gsub
ever sees it.

ratdog:~ mike$ ruby -e 'puts "abc".gsub(/a(b)(c)/, "a\2\1")' | od -c
0000000 a 002 001 \n
0000004

ratdog:~ mike$ irb
irb(main):001:0> 'a\1\2'.length
=> 5
irb(main):002:0> "a\1\2".length
=> 3
irb(main):003:0> "a\2\1"
=> "a\002\001"

the \2 and \1 are interpolated into two single characters in the
double quotes.

Table 22.2 in The Basic Types says \nnn goes to Octal nnn, and here
you see 8 (not a valid octal digit) doesn't get treated the same way
as 1 and 2:

irb(main):004:0> "a\2\1\8"
=> "a\002\0018"

Hope this helps,

Mike

--

Mike Stok <[email protected]>
http://www.stok.ca/~mike/

The "`Stok' disclaimers" apply.
 
P

Phrogz

Is this behavior documented anywhere:

Yes. In many Ruby books, in at least one Ruby FAQ, and many, many
times on the ruby mailing list/forum/newsgroup.
 
B

bbxx789_05ss

Mike said:
Table 22.2 in The Basic Types says \nnn goes to Octal nnn,

Ah. So, \1 and \2 are interpreted as octal character codes. I was
using the following puts statement to debug:

puts "abc".gsub(/a(b)(c)/, "a\2\1") + "<---"

--output:--
a<---


I should have been using:

p "abc".gsub(/a(b)(c)/, "a\2\1")

--output:--
"a\002\001"

Since the ascii codes 1 and 2 represent non-printable characters, I got
no output for them using puts.

My question stemmed from this passage about gsub() in pickaxe2 on p.
613:

"If a string is used as the replacement, special variables from the
match (such as $& and $1) cannot be substituted into it, as the
substitution into the string occurs before the pattern match starts.
However, the sequences \1, \2 and so on may be used to interpolate
successive groups in the match."

That makes it sound like \1 and \2 can be freely used in the replacement
string. There is no mention of the fact that single quotes are required
to keep them from being interpreted as chars written in octal. That
description is very misleading
 
M

m_goldberg

Ah. So, \1 and \2 are interpreted as octal character codes. I was
using the following puts statement to debug:

puts "abc".gsub(/a(b)(c)/, "a\2\1") + "<---"

--output:--
a<---


I should have been using:

p "abc".gsub(/a(b)(c)/, "a\2\1")

--output:--
"a\002\001"
Since the ascii codes 1 and 2 represent non-printable characters, I
got
no output for them using puts.

My question stemmed from this passage about gsub() in pickaxe2 on p.
613:

"If a string is used as the replacement, special variables from the
match (such as $& and $1) cannot be substituted into it, as the
substitution into the string occurs before the pattern match starts.
However, the sequences \1, \2 and so on may be used to interpolate
successive groups in the match."

That makes it sound like \1 and \2 can be freely used in the
replacement
string. There is no mention of the fact that single quotes are
required
to keep them from being interpreted as chars written in octal. That
description is very misleading

No, it's not, That single quotes are required has nothing to do with
gsub. It's something you should know from your understanding of how
the Ruby interpreter handles double quoted strings. As Mike Stok said
the string literal is converted to "a\002\001" long before gsub is
called.

Regards, Morton
 
E

ed.odanow

Morton said:
No, it's not, That single quotes are required has nothing to do with
gsub. It's something you should know from your understanding of how
the Ruby interpreter handles double quoted strings. As Mike Stok said
the string literal is converted to "a\002\001" long before gsub is
called.

Regards, Morton
You should simply use "double-quote-double-quote"

irb(main):001:0> puts "fred:smith".gsub(/(\w+):(\w+)/, '\\2, \\1')
smith, fred

Wolfgang Nádasi-Donner
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,810
Latest member
Kassie0918

Latest Threads

Top