confused by back refs in gsub

Peter Bailey · Aug 13, 2007

Can someone tell me why, in my code below, I'm getting part of the
original search in my substitution in my result, when, I'm not asking
for it, or at least, I don't think I'm asking for it.

Thanks,
Peter

Original line:
<registrantName>Normandy Group LLC</registrantName>

My Code:
xmlfile.gsub!(/<registrantName>(.*)<\/registrantName>/,
'<SUB.HEAD4>\&</SUB.HEAD4>')
I've tried "\1" instead of "\&," too. Same result. I've also tried
putting in "?" marks to make it non-greedy. Same result.

Yields:
<SUB.HEAD4><registrantName>Normandy Group
LLC</registrantName></SUB.HEAD4>

What I want:
<SUB.HEAD4>Normandy Group LLC(/SUB.HEAD4>

Jano Svitok · Aug 13, 2007

Can someone tell me why, in my code below, I'm getting part of the
original search in my substitution in my result, when, I'm not asking
for it, or at least, I don't think I'm asking for it.

Thanks,
Peter

Original line:
<registrantName>Normandy Group LLC</registrantName>

My Code:
xmlfile.gsub!(/<registrantName>(.*)<\/registrantName>/,
'<SUB.HEAD4>\&</SUB.HEAD4>')
I've tried "\1" instead of "\&," too. Same result. I've also tried
putting in "?" marks to make it non-greedy. Same result.

Yields:
<SUB.HEAD4><registrantName>Normandy Group
LLC</registrantName></SUB.HEAD4>

What I want:
<SUB.HEAD4>Normandy Group LLC(/SUB.HEAD4>

This works for me (I've used \1):

require 'test/unit'
class TestGsub < Test::Unit::TestCase
def test_replace
line = "<registrantName>Normandy Group LLC</registrantName>"

line.gsub!(/<registrantName>(.*)<\/registrantName>/,'<SUB.HEAD4>\1</SUB.HEAD4>')
assert_equal(line, '<SUB.HEAD4>Normandy Group LLC</SUB.HEAD4>')
end
end

Note that you have (/SUB.HEAD4> instead of </SUB.HEAD4> (the parenthesis)

Peter Bailey · Aug 13, 2007

Jano said:
This works for me (I've used \1):

require 'test/unit'
class TestGsub < Test::Unit::TestCase
def test_replace
line = "<registrantName>Normandy Group
LLC</registrantName>"

line.gsub!(/<registrantName>(.*)<\/registrantName>/,'<SUB.HEAD4>\1</SUB.HEAD4>')
assert_equal(line, '<SUB.HEAD4>Normandy Group
LLC</SUB.HEAD4>')
end
end

Note that you have (/SUB.HEAD4> instead of </SUB.HEAD4> (the
parenthesis)

Thank you, Jano. Yes, this worked for me now.

Cheers.

Peter Bailey · Aug 13, 2007

Felix said:
If you're hardcoding replacements like that and are certain that your
source
is well formed xml, you could also just skip the back references:

irb(main):001:0> "<registrantName>Normandy Group
LLC</registrantName>".gsub!(/registrantName>/, 'SUB.HEAD4>')
=> "<SUB.HEAD4>Normandy Group LLC</SUB.HEAD4>"
irb(main):002:0>

I don't quite understand your suggestion, Felix. Yes, I believe my
source data is well-formed XML. Are you suggesting that, somehow,
because it is well-formed XML, I can ignore the element closings? I
tried what I thought you meant by:

xmlfile.gsub!(/<registrantName>/, '<SUB.HEAD4>')

and, I got the subhead callout at the beginning of the data, but, the
closing element still is there--</registrantName>/

-Peter

Stefano Crocco · Aug 13, 2007

Alle luned=C3=AC 13 agosto 2007, Peter Bailey ha scritto:

I don't quite understand your suggestion, Felix. Yes, I believe my
source data is well-formed XML. Are you suggesting that, somehow,
because it is well-formed XML, I can ignore the element closings? I
tried what I thought you meant by:

xmlfile.gsub!(/<registrantName>/, '<SUB.HEAD4>')

and, I got the subhead callout at the beginning of the data, but, the
closing element still is there--</registrantName>/

-Peter

What Felix is suggesting is that, if the source is valid XML, then it will=
=20
have the form

<elementName>text</elementName>

so, if you call gsub! passing a regexp matching elementName>, it should=20
replace both the opening and closing tags. When you tried, it didn't work=20
because you left the opening < in the regexp, which didn't match the closin=
g=20
tag (it starts with </r, not <r). The correct call to gsub should be:

xmlfile.gsub!(/registrantName>/, 'SUB.HEAD4>')

(by the way, notice that the regexp doesn't match the starting '<', so it g=
ets=20
removed from the replacement string)

I hope this helps

Stefano

Simon Krahnke · Aug 13, 2007

Thank you, Jano. Yes, this worked for me now.

Please note that regular expressions aren't a very good way to parse
XML. The above expression subgroup will match everything between the
first "<registrantName>" and the last "</registrantName>" which is
probably not what you want.

You can can use non-greedy *? as a workaround in this case.

mfg, simon .... l

Peter Bailey · Aug 14, 2007

Simon said:
As well as any substring "registrantName>". And well-formed XML won't
guarantee that only "<registrantName>" and "</registrantName>" will
contain that.

gsub!(/(<\/?)registrantName>/, '\1SUB.HEAD4>') should do.

But again, CDATA-sections and comments may well contain these strings.
I'd use XSLT or some SAX-Library if it has to be ruby.

mfg, simon .... l

Thank you, everyone. Yes, my XML is well-formed, but, it's also pretty
simple, and, from what our vendor tells me, pretty consistent. I just
need to convert it to SGML for our company publishing system. XSLT is
probably better for this, I'm sure, but, it's enough for me just to
learn Ruby. (-: Plus, I love Ruby.
Thanks again.

What's going on here with gsub?	3	Jul 19, 2007
gsub pattern substitution and ${...}	7	May 11, 2009
using a variable as 1st param in gsub	2	Jan 12, 2008
gsub and slash-ampersand in the substitution string	2	Oct 9, 2006
gsub(/\s*$/, "") doubling string	2	Jul 21, 2003
confused with class	2	May 22, 2009
mysterious memory corruption, very confused	10	Jun 29, 2008
Find and replace with values from array with gsub	5	Mar 3, 2010

confused by back refs in gsub

Peter Bailey

Jano Svitok

Peter Bailey

Peter Bailey

Stefano Crocco

Simon Krahnke

Peter Bailey

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads