confused by back refs in gsub

P

Peter Bailey

Can someone tell me why, in my code below, I'm getting part of the
original search in my substitution in my result, when, I'm not asking
for it, or at least, I don't think I'm asking for it.

Thanks,
Peter


Original line:
<registrantName>Normandy Group LLC</registrantName>

My Code:
xmlfile.gsub!(/<registrantName>(.*)<\/registrantName>/,
'<SUB.HEAD4>\&</SUB.HEAD4>')
I've tried "\1" instead of "\&," too. Same result. I've also tried
putting in "?" marks to make it non-greedy. Same result.

Yields:
<SUB.HEAD4><registrantName>Normandy Group
LLC</registrantName></SUB.HEAD4>

What I want:
<SUB.HEAD4>Normandy Group LLC(/SUB.HEAD4>
 
J

Jano Svitok

Can someone tell me why, in my code below, I'm getting part of the
original search in my substitution in my result, when, I'm not asking
for it, or at least, I don't think I'm asking for it.

Thanks,
Peter


Original line:
<registrantName>Normandy Group LLC</registrantName>

My Code:
xmlfile.gsub!(/<registrantName>(.*)<\/registrantName>/,
'<SUB.HEAD4>\&</SUB.HEAD4>')
I've tried "\1" instead of "\&," too. Same result. I've also tried
putting in "?" marks to make it non-greedy. Same result.

Yields:
<SUB.HEAD4><registrantName>Normandy Group
LLC</registrantName></SUB.HEAD4>

What I want:
<SUB.HEAD4>Normandy Group LLC(/SUB.HEAD4>

This works for me (I've used \1):

require 'test/unit'
class TestGsub < Test::Unit::TestCase
def test_replace
line = "<registrantName>Normandy Group LLC</registrantName>"

line.gsub!(/<registrantName>(.*)<\/registrantName>/,'<SUB.HEAD4>\1</SUB.HEAD4>')
assert_equal(line, '<SUB.HEAD4>Normandy Group LLC</SUB.HEAD4>')
end
end

Note that you have (/SUB.HEAD4> instead of </SUB.HEAD4> (the parenthesis)
 
P

Peter Bailey

Jano said:
This works for me (I've used \1):

require 'test/unit'
class TestGsub < Test::Unit::TestCase
def test_replace
line = "<registrantName>Normandy Group
LLC</registrantName>"

line.gsub!(/<registrantName>(.*)<\/registrantName>/,'<SUB.HEAD4>\1</SUB.HEAD4>')
assert_equal(line, '<SUB.HEAD4>Normandy Group
LLC</SUB.HEAD4>')
end
end

Note that you have (/SUB.HEAD4> instead of </SUB.HEAD4> (the
parenthesis)


Thank you, Jano. Yes, this worked for me now.

Cheers.
 
P

Peter Bailey

Felix said:
If you're hardcoding replacements like that and are certain that your
source
is well formed xml, you could also just skip the back references:

irb(main):001:0> "<registrantName>Normandy Group
LLC</registrantName>".gsub!(/registrantName>/, 'SUB.HEAD4>')
=> "<SUB.HEAD4>Normandy Group LLC</SUB.HEAD4>"
irb(main):002:0>

I don't quite understand your suggestion, Felix. Yes, I believe my
source data is well-formed XML. Are you suggesting that, somehow,
because it is well-formed XML, I can ignore the element closings? I
tried what I thought you meant by:

xmlfile.gsub!(/<registrantName>/, '<SUB.HEAD4>')

and, I got the subhead callout at the beginning of the data, but, the
closing element still is there--</registrantName>/

-Peter
 
S

Stefano Crocco

Alle luned=C3=AC 13 agosto 2007, Peter Bailey ha scritto:
I don't quite understand your suggestion, Felix. Yes, I believe my
source data is well-formed XML. Are you suggesting that, somehow,
because it is well-formed XML, I can ignore the element closings? I
tried what I thought you meant by:

xmlfile.gsub!(/<registrantName>/, '<SUB.HEAD4>')

and, I got the subhead callout at the beginning of the data, but, the
closing element still is there--</registrantName>/

-Peter

What Felix is suggesting is that, if the source is valid XML, then it will=
=20
have the form

<elementName>text</elementName>

so, if you call gsub! passing a regexp matching elementName>, it should=20
replace both the opening and closing tags. When you tried, it didn't work=20
because you left the opening < in the regexp, which didn't match the closin=
g=20
tag (it starts with </r, not <r). The correct call to gsub should be:

xmlfile.gsub!(/registrantName>/, 'SUB.HEAD4>')

(by the way, notice that the regexp doesn't match the starting '<', so it g=
ets=20
removed from the replacement string)

I hope this helps

Stefano
 
S

Simon Krahnke

Thank you, Jano. Yes, this worked for me now.

Please note that regular expressions aren't a very good way to parse
XML. The above expression subgroup will match everything between the
first "<registrantName>" and the last "</registrantName>" which is
probably not what you want.

You can can use non-greedy *? as a workaround in this case.

mfg, simon .... l
 
P

Peter Bailey

Simon said:
As well as any substring "registrantName>". And well-formed XML won't
guarantee that only "<registrantName>" and "</registrantName>" will
contain that.

gsub!(/(<\/?)registrantName>/, '\1SUB.HEAD4>') should do.

But again, CDATA-sections and comments may well contain these strings.
I'd use XSLT or some SAX-Library if it has to be ruby.

mfg, simon .... l

Thank you, everyone. Yes, my XML is well-formed, but, it's also pretty
simple, and, from what our vendor tells me, pretty consistent. I just
need to convert it to SGML for our company publishing system. XSLT is
probably better for this, I'm sure, but, it's enough for me just to
learn Ruby. (-: Plus, I love Ruby.
Thanks again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,266
Messages
2,571,318
Members
48,002
Latest member
EttaPfeffe

Latest Threads

Top