R
Ramiro Diaz Trepat
Hello list,
I need to parse the contents of a weird web site, that uses a
session id which is 80,000 characters long, on a hidden input tag.
I try to use Mechanize for the task, but, since this web page has
the 12th line with 80k characters, I get the following error:
/usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:51:in
`scan': ran out of buffer space on element <input>, starting on line
12. (Hpricot:arseError)
from /usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:51:in `make'
from /usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:15:in `parse'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize/page.rb:37:in
`initialize'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:551:in `new'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:551:in
`fetch_page'
from /usr/lib/ruby/1.8/net/http.rb:1050:in `request'
from /usr/lib/ruby/1.8/net/http.rb:2133:in `reading_body'
from /usr/lib/ruby/1.8/net/http.rb:1049:in `request'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:514:in
`fetch_page'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:185:in `get'
Probably, the line buffer in Hpricot is a fixed size buffer and can't
take this big line.
The "program" is this simple test script:
require 'rubygems'
require 'mechanize'
agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac Safari'
page = agent.get("https://replica.megsa.com.ar/Usuario/MantenimientoContratos.aspx")
puts page.body
Is there a way to configure Hpricot to use a dynamically sized
collection for the line buffer?
I need to parse the contents of a weird web site, that uses a
session id which is 80,000 characters long, on a hidden input tag.
I try to use Mechanize for the task, but, since this web page has
the 12th line with 80k characters, I get the following error:
/usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:51:in
`scan': ran out of buffer space on element <input>, starting on line
12. (Hpricot:arseError)
from /usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:51:in `make'
from /usr/lib/ruby/gems/1.8/gems/hpricot-0.6/lib/hpricot/parse.rb:15:in `parse'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize/page.rb:37:in
`initialize'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:551:in `new'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:551:in
`fetch_page'
from /usr/lib/ruby/1.8/net/http.rb:1050:in `request'
from /usr/lib/ruby/1.8/net/http.rb:2133:in `reading_body'
from /usr/lib/ruby/1.8/net/http.rb:1049:in `request'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:514:in
`fetch_page'
from /usr/lib/ruby/gems/1.8/gems/mechanize-0.6.11/lib/mechanize.rb:185:in `get'
Probably, the line buffer in Hpricot is a fixed size buffer and can't
take this big line.
The "program" is this simple test script:
require 'rubygems'
require 'mechanize'
agent = WWW::Mechanize.new
agent.user_agent_alias = 'Mac Safari'
page = agent.get("https://replica.megsa.com.ar/Usuario/MantenimientoContratos.aspx")
puts page.body
Is there a way to configure Hpricot to use a dynamically sized
collection for the line buffer?