Tidy segfault on Linux

L

Lee Fyock

Hi--

I'm using the Ruby tidy gem to clean some user-input HTML. It works
splendidly on my Mac development machine, but seg faults on a CentOS
linux box.

I've tracked through the code, and the crash occurs in Tidybuf.rb's
to_s function. The "struct.bp" method returns a non-nil value (that
indicates a zero size), but the struct.size is some huge number which
varies run-to-run.

I've googled a ton, and there are a lot of people who have hit
segfaults using Ruby and tidy. Some of the issue seem to have been a
namespace conflict between Graphics/ImageMagick and Tidy, but we've
fixed that (by renaming tidy's GetToken function and recompiling), and
are still hitting a seg fault.

More detail:

Using a fresh Rails 1.2.5 app, I've stepped in console thru the parts
of Tidyobj.rb's clean method, like so:

require 'tidy'
tidy = Tidyobj.new
@doc = Tidylib.create
@outbuf = Tidybuf.new
str = 'hi there!'
rc = -1
rc = Tidylib.parse_string(@doc, str)
rc = Tidylib.clean_and_repair(@doc) if rc >= 0
rc = (Tidylib.opt_parse_value(@doc, :force_output, true) == 1 ? rc :
-1) if rc > 1
rc = Tidylib.save_buffer(@doc, @outbuf.struct) if rc >= 0

At this point:=> #<DL::ptrData:0x0x949aa38 ptr=0x0x29c4d0 size=0 free=0x(nil)>

Then:/usr/lib/ruby/site_ruby/1.8/tidy/tidybuf.rb:39: [BUG] Segmentation
fault
ruby 1.8.4 (2005-12-24) [i386-linux]
Aborted (core dumped)

The shorter way to reproduce this is:/usr/lib/ruby/site_ruby/1.8/tidy/tidybuf.rb:39: [BUG] Segmentation
fault
ruby 1.8.4 (2005-12-24) [i386-linux]
Aborted (core dumped)

If anyone has a clue, please help!

Thanks,
Lee
 
B

Bob Foxworthington

Hi Lee & Scott,

we ran into the excat same issue Lee described in
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/282246

We tried the patch as suggested by Scott in
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/282454

Unfortunately, no luck... we still segfault.

Just wondering if you were able to solve it... and if so, how?

We have Ruby 1.8.5, Tidy Gem 1.1.2, libtidy 0.99 on CentOS 5.2 (x86_64).

One thing we found is that Tidy seems to only segfault if one feeds it
*valid* HTML. If one feeds it bad HTML, it doesn't crash (see example
below).

Thanks!

Bob

---8<-------------------------------------------------------------------------

require "rubygems"
require "tidy"

Tidy.path = "/usr/lib64/libtidy.so"

Tidy.open do |t|
puts "*** BAD SAMPLE"
t.clean "<html>I am bad HTML!</html>"
puts t.errors
puts t.diagnostics
end

Tidy.open do |t|
puts "*** GOOD SAMPLE"
t.clean '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">' +
'<html><head><title>foo</title></head><body><p>bar</p></body></html>'
puts t.errors
puts t.diagnostics
end

---8<-------------------------------------------------------------------------

Outputs the following:

*** BAD SAMPLE
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 1 column 7 - Warning: plain text isn't allowed in <head> elements
line 1 column 7 - Info: <head> previously mentioned
line 1 column 7 - Warning: inserting implicit <body>
line 1 column 7 - Warning: inserting missing 'title' element
Info: Document content looks like HTML 3.2
4 warnings, 0 errors were found!

*** GOOD SAMPLE
(eval):5: [BUG] Segmentation fault
ruby 1.8.5 (2006-08-25) [x86_64-linux]

Aborted
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,967
Messages
2,570,148
Members
46,694
Latest member
LetaCadwal

Latest Threads

Top