HTML-Parser / SGML-Parser

Z

Zach Dennis

Ok, silly question.

I am writing a script to determine my router's WAN ip address and then to
email me once an hour in case it changes. Currently I am running a web
server at work that returns a page with the client's ip address. I need to
parse out the info on the page so I can extract the ip address of my router
when my script/program connects.

I am using the html-parser, sgml-parser and formatter ruby libraries
provided from raa and I have made the changes to the regexp regarding image
width and height. So I'm good there.

In my test.rb file I say:
------------------------------------------------
h = Net::HTTP.new('www.zachstestip.com' , 80 )
resp,data = h.get('/index.php' , nil )

w = DumbWriter.new
f = AbstractFormatter.new(w)
p = HTMLParser.new(f)
p.feed(data)
p.close
------------------------------------------------

Here comes the silly part. The function "feed" is inherited by sgml-parser
to html-parser. It passes "data" along to the sgml-parser function
"goahead". It prints everything to stdout or stderr( i dont know, but it
makes it to my screen =), but there is no print, put, etc... etc... call to
send it there!!! I cant for the life of me determine where in the feed or
goahead functions are outputting my parsed results from data! This is damn
silly of me to ask I know, but how is it getting to my CLI?

In the "goahead" function there is a giant while loop. If i place a print or
puts statement at the right before the loop and right after the loop, then
nothing is outputted( except for my explicit print/puts statements).

Am I losing it?

Zach
 
S

Sean O'Dell

Zach said:
Ok, silly question.

I am writing a script to determine my router's WAN ip address and then to
email me once an hour in case it changes. Currently I am running a web
server at work that returns a page with the client's ip address. I need to
parse out the info on the page so I can extract the ip address of my router
when my script/program connects.

I am using the html-parser, sgml-parser and formatter ruby libraries
provided from raa and I have made the changes to the regexp regarding image
width and height. So I'm good there.

In my test.rb file I say:
------------------------------------------------
h = Net::HTTP.new('www.zachstestip.com' , 80 )
resp,data = h.get('/index.php' , nil )

w = DumbWriter.new
f = AbstractFormatter.new(w)
p = HTMLParser.new(f)
p.feed(data)
p.close
------------------------------------------------

Here comes the silly part. The function "feed" is inherited by sgml-parser
to html-parser. It passes "data" along to the sgml-parser function
"goahead". It prints everything to stdout or stderr( i dont know, but it
makes it to my screen =), but there is no print, put, etc... etc... call to
send it there!!! I cant for the life of me determine where in the feed or
goahead functions are outputting my parsed results from data! This is damn
silly of me to ask I know, but how is it getting to my CLI?

In the "goahead" function there is a giant while loop. If i place a print or
puts statement at the right before the loop and right after the loop, then
nothing is outputted( except for my explicit print/puts statements).

Am I losing it?

Why not just qualify your IP address with something like >>>>IP<<<< and
then you can regex for it like this:

match = />>>>(.+)<<<</.match(HTML)

match[1] => your IP address

Sean O'Dell
 
A

Ara.T.Howard

Ok, silly question.

I am writing a script to determine my router's WAN ip address and then to
email me once an hour in case it changes. Currently I am running a web
server at work that returns a page with the client's ip address. I need to
parse out the info on the page so I can extract the ip address of my router
when my script/program connects.

check out dyndns.org - they have scripts for just about every router that does
this.
I am using the html-parser, sgml-parser and formatter ruby libraries
provided from raa and I have made the changes to the regexp regarding image
width and height. So I'm good there.

In my test.rb file I say:
------------------------------------------------
h = Net::HTTP.new('www.zachstestip.com' , 80 )
resp,data = h.get('/index.php' , nil )

w = DumbWriter.new
f = AbstractFormatter.new(w)
p = HTMLParser.new(f)
p.feed(data)
p.close
------------------------------------------------

one thing i might point out here - i myself have spent hours trying to figure
out weird bugs after naming a variable 'p'. worth a check...
Here comes the silly part. The function "feed" is inherited by sgml-parser
to html-parser. It passes "data" along to the sgml-parser function
"goahead". It prints everything to stdout or stderr( i dont know, but it
makes it to my screen =), but there is no print, put, etc... etc... call to
send it there!!! I cant for the life of me determine where in the feed or
goahead functions are outputting my parsed results from data! This is damn
silly of me to ask I know, but how is it getting to my CLI?

In the "goahead" function there is a giant while loop. If i place a print or
puts statement at the right before the loop and right after the loop, then
nothing is outputted( except for my explicit print/puts statements).

you could also try something like this to track the problem:

alias __p p
alias __print print
alias __puts puts

def p(*args);STDERR.p(caller.join("\n")); __p(*args);end
def print(*args);STDERR.print(caller.join("\n")); __print(*args);end
def puts(*args);STDERR.puts(caller.join("\n")); __puts(*args);end

i'm note sure you'd need all three but... you get the picture.

-a
====================================
| Ara Howard
| NOAA Forecast Systems Laboratory
| Information and Technology Services
| Data Systems Group
| R/FST 325 Broadway
| Boulder, CO 80305-3328
| Email: (e-mail address removed)
| Phone: 303-497-7238
| Fax: 303-497-7259
| The difference between art and science is that science is what we understand
| well enough to explain to a computer. Art is everything else.
| -- Donald Knuth, "Discover"
| ~ > /bin/sh -c 'for lang in ruby perl; do $lang -e "print \"\x3a\x2d\x29\x0a\""; done'
====================================
 
S

Steven Jenkins

This doesn't answer your questions about Ruby, but most of what you want
exists already.

Look at http://www.dyndns.org. I've been using them for a year or so.
Every 5 minutes, a Perl daemon (ddclient) on my system wakes up, grabs
the WAN address from my Linksys box, and if it's changed, updates
dyndns. I can ssh into my system at home using the name
'tidal.dyndns.org', even though the address actually belongs to my ISP.
It works great, and it's free.

Steve
 
B

Ben Giddings

Zach said:
I am using the html-parser, sgml-parser and formatter ruby libraries
provided from raa and I have made the changes to the regexp regarding image
width and height. So I'm good there.

I think the HTML parser might be abandoned (RAA says the last update was
2001-07-10 13:35:40 GMT).

You might have better luck using (my) htmltokenizer. It has a really
simple interface, and it might be more what you need:

http://raa.ruby-lang.org/list.rhtml?name=htmltokenizer

If you really want to use the html-parser, sorry, I can't help you. I
never managed to understand how to work it, which is why I ported the
htmltokenizer.

Ben
 
B

Bernard Delmée

------------------------------------------------
h = Net::HTTP.new('www.zachstestip.com' , 80 )
resp,data = h.get('/index.php' , nil )

w = DumbWriter.new
f = AbstractFormatter.new(w)
p = HTMLParser.new(f)
p.feed(data)
p.close
------------------------------------------------

Here comes the silly part. The function "feed" is inherited by
sgml-parser to html-parser. It passes "data" along to the sgml-parser
function "goahead". It prints everything to stdout or stderr( i dont
know, but it makes it to my screen =), but there is no print, put,
etc... etc... call to send it there!!! I cant for the life of me
determine where in the feed or goahead functions are outputting my
parsed results from data! This is damn silly of me to ask I know, but
how is it getting to my CLI?

Through the DumbWriter. Check its implementation in
....\Ruby\lib\ruby\site_ruby\formatter.rb
that's where the "write" statements live.

Often times when you want to parse HTML, it is simpler to use
the (misleadingly named) SGMLParser. Anyway these libraries are
direct ports of python modules, and can only be understood by
checking the documentation of the originals.
See eg: http://www.python.org/doc/1.5.2/lib/module-sgmllib.html
And usage examples (in python ;-)
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52281
http://www.oreilly.com/catalog/pythonsl/chapter/ch05.html#t4

Cheers,

Bernard.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,189
Members
46,735
Latest member
HikmatRamazanov

Latest Threads

Top