bug in SGML parser

C

cesium62

The sgml parser
/usr/local/lib/ruby/gems/1.8/gems/htmltools-1.10/lib/html/sgml-parser.rb
does not correctly handle character references. Yahoo sometimes
generates a character reference that looks like "'". Firefox
displays this as a single quote. sgml-parser.rb raises an exception.

rescued: invalid value for Integer: "039" at Wed Dec 06 18:02:56 PST
2006
/usr/local/lib/ruby/gems/1.8/gems/htmltools-1.10/lib/html/sgml-parser.rb:335:in
`Integer'/usr/local/lib/ruby/gems/1.8/gems/htmltools-1.10/lib/html/sgml-parser.rb:335:in
`handle_charref'/usr/local/lib/ruby/gems/1.8/gems/htmltools-1.10/lib/html/sgml-parser.rb:159:in
`goahead'/usr/local/lib/ruby/gems/1.8/gems/htmltools-1.10/lib/html/sgml-parser.rb:88:in
`feed'/usr/local/lib/ruby/gems/1.8/gems/rubyful_soup-1.0.4/lib/rubyful_soup.rb:547:in
`feed'/usr/local/lib/ruby/gems/1.8/gems/rubyful_soup-1.0.4/lib/rubyful_soup.rb

Possible fix: In handle_charref, strip leading zeroes from 'name'
prior to calling 'Integer'.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top