Parsing XML with Ruby

  • Thread starter jackster the jackle
  • Start date
J

jackster the jackle

I need to hit an https link and pass a username and password in order to
pull down some records in xml format. I was thinking that the easiest
way to do this is to shell out to curl and then parse my xml provided
that I could pass the username/password in the url.

Can anyone recommend an easy way to accomplish this?

thanks

jackster
 
A

Ammar Ali

I need to hit an https link and pass a username and password in order to
pull down some records in xml format. I was thinking that the easiest
way to do this is to shell out to curl and then parse my xml provided
that I could pass the username/password in the url.

Can anyone recommend an easy way to accomplish this?

The ruby way of doing this would be to use the Net::HTTP from the
standard library. It does https too. The docs are at:

http://ruby-doc.org/stdlib/libdoc/net/http/rdoc/index.html

As for parsing XML, there are a few options, but Nokogiri is probably
the easiest to learn and deal with. Easily googlable, and there are a
few blog posts with samples.

HTH, and have fun!
Ammar
 
J

jackster the jackle

thanks alot for the advise. I have used Net::HTTP alot in the past but
could never get it working with HTTPS but I'll read the docs again and
have at it....

jack
 
A

Ammar Ali

thanks alot for the advise. I have used Net::HTTP alot in the past but
could never get it working with HTTPS but I'll read the docs again and
have at it....

The difference between http and https with Net:HTTP can be summed by:

require 'net/https' # extra require

http = Net::HTTP.new('server.net', 443) # note the port
http.use_ssl = true # turn it on

Regards,
Ammar
 
J

jackster the jackle

It seems to be working...here is my test code:

http = Net::HTTP.new('www.chase.com', 443) # note the port
http.use_ssl = true # turn it on
puts "Here is the page: #{http} "

The only problem is, I get this returned:

Here is the page: #<Net::HTTP:0x31b31d8>

That looks like an array, how do I get the data out of it, should I use
an "each do" statement?

thanks

jack
 
A

Ammar Ali

It seems to be working...here is my test code:

http =3D Net::HTTP.new('www.chase.com', 443) # note the port
http.use_ssl =3D true # turn it on
puts "Here is the page: #{http} "

The only problem is, I get this returned:

=C2=A0 =C2=A0Here is the page: #<Net::HTTP:0x31b31d8>

That's not the page. That's the instance of Net:HTTP, the we client,
which you have to use to fetch the page.
That looks like an array, how do I get the data out of it, should I use
an "each do" statement?

Please read the documentation to find out how to use the instance you creat=
ed.

Good luck,
Ammar
 
J

jackster the jackle

Ammar Ali wrote in post #960938:
Please read the documentation to find out how to use the instance you
created.

Good luck,
Ammar


What, have you suddenly decided to play the "school master"? I can't
stand people like you who think they are smarter than everyone else and
pretend to help by giving stupid little hints....get lost Ammar
 
A

Ammar Ali

Ammar Ali wrote in post #960938:


What, have you suddenly decided to play the "school master"? I can't
stand people like you who think they are smarter than everyone else and
pretend to help by giving stupid little hints....

In an earlier post you wrote: "I have used Net::HTTP alot in the
past". Your last question clearly indicates that you are not telling
the truth. And it shows that you didn't even try to read the docs. I
was willing to help, but I'm not willing to do the work for you.
get lost Ammar
With pleasure.

Good luck,
Ammar
 
P

Peter Hickman

There is this really cool tool that all us programmers know about but
we never tell the noobs. But as you asked so nice I will tell you. Its
called google :)

Try this query "read an https url in ruby", look at the first result.
Looks interesting?

As you have displayed no ability to solve your own problems and an
inflated sense of your own importance I think that you will probably
go nowhere. Ammar has done his best to help you, you however have
expected everything to be given to you on a plate. Behave like a child
and you will be treated like a child.

TTFN
 
P

Phillip Gawlowski

What, have you suddenly decided to play the "school master"? I can't
stand people like you who think they are smarter than everyone else and
pretend to help by giving stupid little hints....get lost Ammar

Aggression won't get you anywhere. So, go read the documentation, and
if you have specific questions, we are happy to help. But read the
docs first. That's what it's there for.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
 
J

jackster the jackle

I'm not a developer..I'm a network engineer and have been for 15
years...I have also been a member of this forum (off and on) for the
last 5 years or so.

Believe me, I google everything and try and figure it out before I post
on the forum....I'm sorry, I don't understand everything I read in the
docs like you guys do...much of it is a foreign language to me.

I have been able to get some things working over the years by trial and
error and by modifying and expanding some base snippets of code...much
of the help I received on this forum by people who were willing to help
even dummies like me.

If someone I didn't know asked me networking questions I wouldn't tell
them to RTFM...that is what know it alls do.

I appreciate any and all help from Ammar and I'm sorry I got upset but I
can't deal with people that deliberately hold back info to try and teach
people some kind of lesson.
 
B

botp

It seems to be working...here is my test code:

http = Net::HTTP.new('www.chase.com', 443) # note the port
http.use_ssl = true # turn it on
puts "Here is the page: #{http} "

just continue... do not be afraid to try...

response = http.get('/index.html')
#=> #<Net::HTTPOK 200 OK readbody=true>

response.body
#=> "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\"
\"http://www.w3.org/TR/html4/strict.dtd\">\n<html
xmlns:xalan=\"http://xml.apache.org/xalan\"
xmlns:java=\"http://xml.apache.org/xslt/java\" LANG=\"EN\"><head><META
http-equiv=\"Content-Type\" content=\"text/html;
charset=UTF-8\"><title>CHASE Home: Personal Banking | Personal Lending
| Retirement &amp; Investing | Business
Banking</title><script>\n\t\t\t\t\t\tvar pageId = '/online....

<text snipped>
.......

hth
kind regards -botp
 
J

jackster the jackle

botp wrote in post #960964:
response = http.get('/index.html')
#=> #<Net::HTTPOK 200 OK readbody=true>

response.body

That does help and thank you.
Prior to your post, I was able to get the following working which is
similar to what you gave me:
---------
uri = URI.parse("https://www.chase.com/")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
puts response.body
response["header-here"] # All headers are lowercase
----------
I am able to pull the page from Chase but the https url that I really
need requires a one time username and password which I am having trouble
with.

I tried modifying the uri to be:
---------
uri = URI.parse("https://myusername:[email protected]/")
---------

but that didn't work so I tried adding this which I got from Google:
---------
http.basic_auth("username", "password")
---------

Am I proceeding down the right path with the authentication?

thanks

jack
 
B

botp

Am I proceeding down the right path with the authentication?

yes.
just input the command a line at a time... redo back if you get confused...

eg,

require 'net/http'
#=> true
uri = URI.parse("https://www.chase.com/")
#=> #<URI::HTTPS:0x83ea2a8 URL:https://www.chase.com/>
http = Net::HTTP.new(uri.host, uri.port)
#=> #<Net::HTTP www.chase.com:443 open=false>
http.use_ssl = true
#=> true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
#=> 0
request = Net::HTTP::Get.new(uri.request_uri)
#=> #<Net::HTTP::Get GET>
request.basic_auth("username", "password")
#=> ["Basic dXNlcm5hbWU6cGFzc3dvcmQ="]
response = http.request request
#=> #<Net::HTTPOK 200 OK readbody=true>
response.body
#=> "<!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.01//EN\"
\"http://www.w3.org/TR/html4/strict.dtd\">\n<html
xmlns:xalan=\"http://xml.apache.org/xalan\"
xmlns:java=\"http://xml.apache.org/xslt/java\" LANG=\"EN\"><head><META
http-equiv=\"Content-Type\" content=\"text/html;
charset=UTF-8\"><title>CHASE Home: Personal Banking | Personal Lending
| Retirement &amp; Investing | Business
Banking</title><script>\n\t\t\t\t\t\tvar pageId =
'/online/Home/Chase-Home.dwt';\n\t\t\t\t\t</script><META
name=\"robots\" content=\"INDEX, FOLLOW\"><META name=\"....

<snipped text>....

hth.
kind regards -botp
 
J

jackster the jackle

botp wrote in post #960976:
require 'net/http'
#=> true
uri = URI.parse("https://www.chase.com/")
#=> #<URI::HTTPS:0x83ea2a8 URL:https://www.chase.com/>
http = Net::HTTP.new(uri.host, uri.port)
#=> #<Net::HTTP www.chase.com:443 open=false>
http.use_ssl = true
#=> true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
#=> 0

That is a good way of doing it, a step at a time through irb...I don't
usually do it that way but it seems to be a good way to isolate the
failure if any.

When I get down to the line "http.use_ssl = true", it fails with this
error message:

irb(main):004:0> http.use_ssl = true
NoMethodError: undefined method `use_ssl=' for #<Net::HTTP myurl.com:443
open=false>
from (irb):4
irb(main):005:0>
 
J

jackster the jackle

Hi Botp,

I got it working thanks to you. I had to start off with require
'net/https' instead of require 'net/http' and everything worked.

I can't thank you enough for working with me and helping me learn
something.

take care

jack







jackster the jackle wrote in post #960988:
 
R

Robert Klemme

botp wrote in post #960964:
response = http.get('/index.html')
#=> #<Net::HTTPOK 200 OK readbody=true>

response.body

That does help and thank you.
Prior to your post, I was able to get the following working which is
similar to what you gave me:
---------
uri = URI.parse("https://www.chase.com/")
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
request = Net::HTTP::Get.new(uri.request_uri)
response = http.request(request)
puts response.body
response["header-here"] # All headers are lowercase
----------
I am able to pull the page from Chase but the https url that I really
need requires a one time username and password which I am having trouble
with.

I tried modifying the uri to be:

Both approaches above should be equivalent. Question is which
authentication method the website uses. It may as well be form fields
(sent via POST for example) or even via a certificate.

Kind regards

robert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,962
Messages
2,570,134
Members
46,690
Latest member
MacGyver

Latest Threads

Top