Nokogiri not pulling correct XPath

Scott B. · Feb 28, 2011

Hi everyone,

I was wondering if anyone could help me. I'm trying to pull text from a
website using nokogiri and not all the text is not being pulled into my
variables through XPath.

I have used Firebug (Firefox extension) to pull the correct XPath from
the page so I'm thinking it should be correct. So far, I have:

variable1 =
(doc/"/html/body/div[2]/div[7]/div[4]/div[3]/div[6]/div/div/div/div/div/div/div/h2").inner_html

variable 2 =
(doc/"/html/body/div[3]/div[7]/div[4]/div[3]/div[6]/div/div/div/div/div/div[2]/table/tbody/tr/td[2]/strong").inner_html

variable 3 =
(doc/"/html/body/div[3]/div[7]/div[4]/div[3]/div[6]/div/div/div/div/div/div[2]/table/tbody/tr/td[2]/strong[2]").inner_html

Now, variable1 is working but I can't get any values out of variable2 or
variable3. Is there a different syntax I should be using? To test, I've
only been outputting to the cli but I want to eventually push these into
a sqlite3 database.

Anyone have any ideas?
Cheers.

Scott.

Luis G. · Feb 28, 2011

Hello...

I've been using Nokogiri for a while and I never had problems with it.
It works great.

I have some questions for you... Why do you put the full path to the h2
tag?
The h2 has a class or an id defined? how about all the div in between,
they have class or id defined?

I'm asking that because you can access inner_html of an html tag like
this:

doc.xpath("//div[@class='(class of the div here)']/h2").each do |node|
var = node.inner.html
end

You don't really need to put the full path to the html tag. You can also
use //div[@id='(id of the div here), for example.

Probably the other variables are not working because you missed a div or
something else in between... I think the way I show in lines above is
easy to get the html content without making mistakes.

If you want just let me know the url you want to get the content and
I'll build a small script to do that.

Regards,

Luis Goncalves

Robert Klemme · Feb 28, 2011

I was wondering if anyone could help me. I'm trying to pull text from a
website using nokogiri and not all the text is not being pulled into my
variables through XPath.

I have used Firebug (Firefox extension) to pull the correct XPath from
the page so I'm thinking it should be correct. So far, I have:

variable1 =
(doc/"/html/body/div[2]/div[7]/div[4]/div[3]/div[6]/div/div/div/div/div/div/div/h2").inner_html

variable 2 =
(doc/"/html/body/div[3]/div[7]/div[4]/div[3]/div[6]/div/div/div/div/div/div[2]/table/tbody/tr/td[2]/strong").inner_html

variable 3 =
(doc/"/html/body/div[3]/div[7]/div[4]/div[3]/div[6]/div/div/div/div/div/div[2]/table/tbody/tr/td[2]/strong[2]").inner_html

Now, variable1 is working but I can't get any values out of variable2 or
variable3. Is there a different syntax I should be using? To test, I've
only been outputting to the cli but I want to eventually push these into
a sqlite3 database.

Anyone have any ideas?

First I would dump the page _as loaded by your program_ (this is
important) to disk and verify that those XPaths do work independently
(e.g. with Firefox's DOM Inspector or Eclipse XML tools).

Kind regards

robert

Eric Christopherson · Feb 28, 2011

Hi everyone,

I was wondering if anyone could help me. I'm trying to pull text from a
website using nokogiri and not all the text is not being pulled into my
variables through XPath.

I have used Firebug (Firefox extension) to pull the correct XPath from
the page so I'm thinking it should be correct. So far, I have:

variable1 =
(doc/"/html/body/div[2]/div[7]/div[4]/div[3]/div[6]/div/div/div/div/div/div/div/h2").inner_html

variable 2 =
(doc/"/html/body/div[3]/div[7]/div[4]/div[3]/div[6]/div/div/div/div/div/div[2]/table/tbody/tr/td[2]/strong").inner_html

variable 3 =
(doc/"/html/body/div[3]/div[7]/div[4]/div[3]/div[6]/div/div/div/div/div/div[2]/table/tbody/tr/td[2]/strong[2]").inner_html

Now, variable1 is working but I can't get any values out of variable2 or
variable3.

In my experience, Firebug shows a tbody element as part of the xpath,
even if there is no actual tbody tag in the HTML. In that case,
Nokogiri will fail to find the right element unless you take out the
'tbody/'.

Scott B. · Mar 1, 2011

Thanks guys for the help. In the end, I think it had more to do with the
tbody than anything. I still couldn't get it working with Xpath however,
so used CSS and was able to get it working that way (albeit in a round
about fashion using an array).

Cheers.

Scott.

Uncaught ReferenceError: item is not defined at HTMLButtonElement.onclick in the: <button onclick="item.inserir()">Inserir dados</button>	1	Apr 22, 2023
Help with code	0	Jun 12, 2022
Javascript DOM	1	Mar 29, 2023
Only one table shows up with the information	2	Mar 29, 2023
Add recipes using JavaScript in table	20	Apr 17, 2023
Survey details won't go through using php, ajax, Mysql	0	Oct 26, 2023
Help with Visual Lightbox: Scripts	2	May 3, 2023
Filename undefined for Blob ?	1	Oct 28, 2023

Nokogiri not pulling correct XPath

Scott B.

Luis G.

Robert Klemme

Eric Christopherson

Scott B.

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads