N
nutsmuggler
Hello folks.
I managed to write a SGML parser with the hpricot library. As I
explained in a previous thread, I just need to compare source and
traget tags of translation memory files from IBM Translation manager.
The script now runs effectively, but I realised that it cannot cope
with large files; I tried to process TM file larger than 1MB and the
script took ages to generate the output. Should I switch to a compiled
language for this specific task?
At any rate, here is the script, it's very basic; please let me know
if I did something wrong or if its slowness is a necessary drawback of
ruby being interpreted. Cheers,
Davide
#!/usr/local/bin/ruby
require 'rubygems'
require 'hpricot'
$pattern = "server"
result = File.new("result.html", "w")
$stdout = result
puts "<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01//EN'
'http://www.w3.org/TR/html4/strict.dtd'>\n
<head>\n
<meta http-equiv='Content-type' content='text/html; charset=utf-8'>\n
<title>Ricerca di '#{$pattern}'</title>\n
<style type='text/css'>
body {
}
p {
margin: 0px;
}
p.source {
background: #FFFFCC;
padding: 10px 5px 10px 5px;
}
p.target {
background: #F8A271;
padding: 10px 5px 10px 5px;
}
span.pattern {
background: #B6B6B6;
}
</style>
</head>\n
<body>\n"
# per aprire lo stdin
# doc = Hpricot.XML(STDIN)
doc = Hpricot.XML(open("bch01aad006_MEMORIA.EXP"))
doc.search("Source").each do |item|
if item.innerHTML =~ /#{$pattern}/
highlightedSource = item.innerHTML.gsub(/#{$pattern}/, "<span
class='pattern'>#{$pattern}</span>")
puts "<p class='source'>EN: #{highlightedSource}</p>\n"
puts "<p class='target'>IT: #{item.next_sibling.html}</p>\n
<hr/>"
end
end
puts "</body>"
I managed to write a SGML parser with the hpricot library. As I
explained in a previous thread, I just need to compare source and
traget tags of translation memory files from IBM Translation manager.
The script now runs effectively, but I realised that it cannot cope
with large files; I tried to process TM file larger than 1MB and the
script took ages to generate the output. Should I switch to a compiled
language for this specific task?
At any rate, here is the script, it's very basic; please let me know
if I did something wrong or if its slowness is a necessary drawback of
ruby being interpreted. Cheers,
Davide
#!/usr/local/bin/ruby
require 'rubygems'
require 'hpricot'
$pattern = "server"
result = File.new("result.html", "w")
$stdout = result
puts "<!DOCTYPE HTML PUBLIC '-//W3C//DTD HTML 4.01//EN'
'http://www.w3.org/TR/html4/strict.dtd'>\n
<head>\n
<meta http-equiv='Content-type' content='text/html; charset=utf-8'>\n
<title>Ricerca di '#{$pattern}'</title>\n
<style type='text/css'>
body {
}
p {
margin: 0px;
}
p.source {
background: #FFFFCC;
padding: 10px 5px 10px 5px;
}
p.target {
background: #F8A271;
padding: 10px 5px 10px 5px;
}
span.pattern {
background: #B6B6B6;
}
</style>
</head>\n
<body>\n"
# per aprire lo stdin
# doc = Hpricot.XML(STDIN)
doc = Hpricot.XML(open("bch01aad006_MEMORIA.EXP"))
doc.search("Source").each do |item|
if item.innerHTML =~ /#{$pattern}/
highlightedSource = item.innerHTML.gsub(/#{$pattern}/, "<span
class='pattern'>#{$pattern}</span>")
puts "<p class='source'>EN: #{highlightedSource}</p>\n"
puts "<p class='target'>IT: #{item.next_sibling.html}</p>\n
<hr/>"
end
end
puts "</body>"