Escaping single quotes in XPath query with REXML

F

Francis Hwang

Anybody tried to use XPath in REXML with a single quote, only to run
into the fact that quote escaping in XPath is apparently not accounted
for? If this were in the context on XSLT I'd be able to assign some
annoying temp variable like $apos, but it's not, so I can't.

irb(main):001:0> require 'rexml/document'
=> true
irb(main):002:0> include REXML
=> Object
irb(main):003:0> xml = "<rss version='2.0'><channel><item><title>John's
Doe</title></item></channel></rss>"
=> "<rss version='2.0'><channel><item><title>John's
Doe</title></item></channel></rss>"
irb(main):004:0> xmldoc = Document.new xml
=> <UNDEFINED> ... </>
irb(main):005:0> XPath.first( xmldoc, "/rss/channel/item/title" ).to_s
=> "<title>John's Doe</title>"
irb(main):006:0> XPath.first( xmldoc,
"/rss/channel/item/title[text()='John's Doe']" ).to_s
NoMethodError: undefined method `node_type' for "John":String
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:124:in
`internal_parse'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:123:in `each'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:123:in
`internal_parse'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:49:in `match'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:402:in
`Predicate'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:346:in
`Predicate'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:204:in
`internal_parse'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:199:in
`times'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:199:in
`internal_parse'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:49:in `match'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:34:in `parse'
from /usr/local/lib/ruby/1.8/rexml/xpath.rb:28:in `first'
from (irb):6
irb(main):007:0> XPath.first( xmldoc,
"/rss/channel/item/title[text()='John\'s Doe']" ).to_s
NoMethodError: undefined method `node_type' for "John":String
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:124:in
`internal_parse'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:123:in `each'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:123:in
`internal_parse'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:49:in `match'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:402:in
`Predicate'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:346:in
`Predicate'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:204:in
`internal_parse'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:199:in
`times'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:199:in
`internal_parse'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:49:in `match'
from /usr/local/lib/ruby/1.8/rexml/xpath_parser.rb:34:in `parse'
from /usr/local/lib/ruby/1.8/rexml/xpath.rb:28:in `first'
from (irb):7
 
B

Brian Candler

irb(main):006:0> XPath.first( xmldoc,
"/rss/channel/item/title[text()='John's Doe']" ).to_s

I'm no expert in XPath, but that looks like a broken XPath query because of
the three single quotes.
irb(main):007:0> XPath.first( xmldoc,
"/rss/channel/item/title[text()='John\'s Doe']" ).to_s

That's identical, as you'll see if you try this:

irb(main):001:0> a="text()='John\'s Doe'"
=> "text()='John's Doe'"

You've not inserted a backslash into the string, you just escaped the quote,
and the escaping was removed. You need two backslashes to insert a single
backslash into the string:

irb(main):002:0> a="text()='John\\'s Doe'"
=> "text()='John\\'s Doe'"

(Despite how it looks, there is only a single backslash in there; it's shown
as two because it's inside a double-quoted string, to make it valid Ruby)

irb(main):003:0> a.each_byte { |c| print c.chr," " }
t e x t ( ) = ' J o h n \ ' s D o e ' => "text()='John\\'s Doe'"

However, I've just had a quick scan through the XPath-1.0 spec, and I don't
think that's how you do it. You can include single quotes inside a
double-quoted string, and vice versa. But probably what you want for the
general case is XML character entities: ' or &apos;

Try passing your string through this before constructing your XPath query:

require 'rexml/text'
a = "John's Doe"
b = REXML::Text::normalize(a)
#=> "John&apos;s Doe"

HTH,

Brian.
 
B

Brian Candler

Try passing your string through this before constructing your XPath query:

require 'rexml/text'
a = "John's Doe"
b = REXML::Text::normalize(a)
#=> "John&apos;s Doe"

Hmm, that doesn't work.

irb(main):007:0> XPath.first( xmldoc, "/rss/channel/item/title[text()='John&apos;s Doe']" ).to_s
=> ""
irb(main):008:0> XPath.first( xmldoc, "/rss/channel/item/title[text()='John's Doe']" ).to_s
=> ""
irb(main):009:0> XPath.first( xmldoc, "/rss/channel/item/title[text()=\"John's Doe\"]" ).to_s
=> "<title>John's Doe</title>"

You might want to raise that with the REXML author. In the mean time, if you
know the string only contains single quotes, then you can surround it with
double quotes in the XPath query, as per the third line above.

Regards,

Brian.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,967
Messages
2,570,148
Members
46,694
Latest member
LetaCadwal

Latest Threads

Top