Nokogiri bug or intended effect??

Jeremy Woertink · May 3, 2010

I'm trying to parse this (poorly formatted) page, and when I look at the
page I see:

Name: ZITO, PEDRO OSVALDO

When I look at the source I get:

<td colspan="4" >
Name: ZITO, PEDRO OSVALDO 

</td>

When I parse the page I get:

page.search("/html/body/table[3]/tr[1]/td[4]/table/tr[1]/td[1]/table/tr[3]/td[2]/table/tr[1]/td[1]/table/tr[2]/td[1]").first

Click to expand...

=> #<Nokogiri::XML::Element:0x1eb1f76 name="td"
attributes=[#<Nokogiri::XML::Attr:0x1eb1eea name="colspan" value="4">]
children=[#<Nokogiri::XML::Element:0x1eb0d24 name="font"
attributes=[#<Nokogiri::XML::Attr:0x1eb0c7a name="face" value="Arial">,
#<Nokogiri::XML::Attr:0x1eb0c70 name="size" value="2">]
children=[#<Nokogiri::XML::Text:0x1eb0694 "
\r\n ">, #<Nokogiri::XML::Element:0x1eb066c name="b"
children=[#<Nokogiri::XML::Text:0x1eb0496 "Name: ">]>,
#<Nokogiri::XML::Text:0x1eb03c4 "ZITO,PEDROOSVALDO
">]>, #<Nokogiri::XML::Text:0x1eaf636 " \r\n
\r\n ">]>
If you notice in the #<Nokogiri::XML::Text:0x1eb03c4 "ZITO,PEDROOSVALDO
"> All the spaces in the name have been removed.

Here's what I'm using:
=> "2.7.3"
macbook-pro:~ jeremywoertink$ ruby -v
ruby 1.8.6 (2009-06-08 patchlevel 369) [universal-darwin9.0]

Anyone have any ideas? My guess is maybe an encoding issue??? There are
other areas in the pages where I have to do string.gsub("\302\240", "").

Thanks,

~Jeremy

G_ F_ · May 4, 2010

Try using the .content() or .text() methods to get the text content of
the nodes.

Mike Dalessio · May 4, 2010

[Note: parts of this message were removed to make it a legal post.]

If you post this question to nokogiri-talk with a reproducible test case, I
think you'll quickly get a response from the helpful nokogiri community.

I'm trying to parse this (poorly formatted) page, and when I look at the
page I see:

Name: ZITO, PEDRO OSVALDO

When I look at the source I get:

<td colspan="4" >
Name: ZITO, PEDRO OSVALDO 

</td>

When I parse the page I get

age.search("/html/body/table[3]/tr[1]/td[4]/table/tr[1]/td[1]/table/tr[3]/td[2]/table/tr[1]/td[1]/table/tr[2]/td[1]").first
=> #<Nokogiri::XML::Element:0x1eb1f76 name="td"
attributes=[#<Nokogiri::XML::Attr:0x1eb1eea name="colspan" value="4">]
children=[#<Nokogiri::XML::Element:0x1eb0d24 name="font"
attributes=[#<Nokogiri::XML::Attr:0x1eb0c7a name="face" value="Arial">,
#<Nokogiri::XML::Attr:0x1eb0c70 name="size" value="2">]
children=[#<Nokogiri::XML::Text:0x1eb0694 "
\r\n ">, #<Nokogiri::XML::Element:0x1eb066c name="b"
children=[#<Nokogiri::XML::Text:0x1eb0496 "Name: ">]>,
#<Nokogiri::XML::Text:0x1eb03c4 "ZITO,PEDROOSVALDO
">]>, #<Nokogiri::XML::Text:0x1eaf636 " \r\n
\r\n ">]>
If you notice in the #<Nokogiri::XML::Text:0x1eb03c4 "ZITO,PEDROOSVALDO
"> All the spaces in the name have been removed.

Here's what I'm using:
=> "2.7.3"
macbook-pro:~ jeremywoertink$ ruby -v
ruby 1.8.6 (2009-06-08 patchlevel 369) [universal-darwin9.0]

Anyone have any ideas? My guess is maybe an encoding issue??? There are
other areas in the pages where I have to do string.gsub("\302\240", "").

Thanks,

~Jeremy

Jeremy Woertink · May 4, 2010

G_ F_ said:
Try using the .content() or .text() methods to get the text content of
the nodes.

Yeah, I tried that. It just returns the name all squished. Any other
ideas?

Jeremy Woertink · May 5, 2010

Cool, I'll try that. Thanks man.

~Jeremy

Mike said:
If you post this question to nokogiri-talk with a reproducible test
case, I
think you'll quickly get a response from the helpful nokogiri community.

Sort by number of characters	1	Nov 2, 2023
Can someone tell me if this a real tracker? Or is it one designed to show you a different message at certain times, ie. acting like one?	0	Jan 10, 2021
[ANN] Nokogiri 1.4.4 Released	0	Nov 16, 2010
Can anyone please help? HTML - two tables applying different styles	4	Dec 1, 2020
Using Nokogiri	17	Nov 8, 2009
URL paramater sts - mechanize & nokogiri differences	1	Oct 9, 2010
Uncaught ReferenceError: item is not defined at HTMLButtonElement.onclick in the: <button onclick="item.inserir()">Inserir dados</button>	1	Apr 22, 2023
Nokogiri not pulling correct XPath	4	Feb 28, 2011

Nokogiri bug or intended effect??

Jeremy Woertink

G_ F_

Mike Dalessio

Jeremy Woertink

Jeremy Woertink

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads