Mechanize

R

Ramiro Diaz Trepat

Hello list,
I need to parse the contents of a weird web site, that uses a
session id which is 80,000 characters long, on a hidden input tag.
I try to use Mechanize for the task, but, since this web page has
the 12th line with 80k characters, I get the following error:

/usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:51:in
`scan': ran out of buffer space on element <input>, starting on line
12. (Hpricot::parseError)
from /usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:51:in `make'
from /usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:15:in `parse'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize/page.rb:37:in
`initialize'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:551:in `new'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:551:in
`fetch_page'
from /usr/lib/ruby/1.8/net/http.rb:1050:in `request'
from /usr/lib/ruby/1.8/net/http.rb:2133:in `reading_body'
from /usr/lib/ruby/1.8/net/http.rb:1049:in `request'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:514:in
`fetch_page'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:185:in `get'


Probably, the line buffer in Hpricot is a fixed size buffer and can't
take this big line.


The "program" is this simple test script:

require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac Safari'
page = agent.get("https://replica.megsa.com.ar/Usuario/MantenimientoContratos.aspx")
puts page.body


Is there a way to configure Hpricot to use a dynamically sized
collection for the line buffer?
 
A

Aaron Patterson

Hello list,
I need to parse the contents of a weird web site, that uses a
session id which is 80,000 characters long, on a hidden input tag.
I try to use Mechanize for the task, but, since this web page has
the 12th line with 80k characters, I get the following error:

/usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:51:in
`scan': ran out of buffer space on element <input>, starting on line
12. (Hpricot::parseError)
from /usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:51:in `make'
from /usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:15:in `parse'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize/page.rb:37:in
`initialize'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:551:in `new'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:551:in
`fetch_page'
from /usr/lib/ruby/1.8/net/http.rb:1050:in `request'
from /usr/lib/ruby/1.8/net/http.rb:2133:in `reading_body'
from /usr/lib/ruby/1.8/net/http.rb:1049:in `request'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:514:in
`fetch_page'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:185:in `get'


Probably, the line buffer in Hpricot is a fixed size buffer and can't
take this big line.


The "program" is this simple test script:

require 'rubygems'
require 'mechanize'

agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac Safari'
page = agent.get("https://replica.megsa.com.ar/Usuario/MantenimientoContratos.aspx")
puts page.body


Is there a way to configure Hpricot to use a dynamically sized
collection for the line buffer?

You can configure hpricot's buffer size:

Hpricot.buffer_size = 2621444

http://code.whytheluckystiff.net/hpricot/ticket/13

I think that should fix your issue.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,961
Messages
2,570,131
Members
46,689
Latest member
liammiller

Latest Threads

Top