A question about Iconv arguments

A

Axel Etzold

Dear all,

I need to convert some accented text, and I would like to know
what arguments I have to give Iconv to produce the desired output.
E.g., in Italian, the word for Friday is "venerdi", where the
"i" carries a dash (small i with grave accent).
If you type this into Wikipedia search in Italian
(which I believed to be in utf-8 encoding),
it will load:

http://it.wikipedia.org/wiki/Venerdì ,

yet this syntax:

converted_doc = Iconv.new(output_encoding, input_encoding).iconv(doc)

gives me "venerd\303\254" when I convert from latin1 encoding.

What arguments do I have to use ?

Thank you,

Best regards,

Axel
 
A

Alex Young

Axel said:
Dear all,

I need to convert some accented text, and I would like to know
what arguments I have to give Iconv to produce the desired output.
E.g., in Italian, the word for Friday is "venerdi", where the
"i" carries a dash (small i with grave accent).
If you type this into Wikipedia search in Italian
(which I believed to be in utf-8 encoding),
it will load:

http://it.wikipedia.org/wiki/Venerdì ,

yet this syntax:

converted_doc = Iconv.new(output_encoding, input_encoding).iconv(doc)

gives me "venerd\303\254" when I convert from latin1 encoding.
That looks right to me - if I write that into a UTF-8 HTML document, it
displays correctly. What are you expecting?
 
A

Axel Etzold

Dear Alex,

thank you for responding.
If I try to get a webpage that has accents in its address,
like
require "rubygems"
require "rio"
require 'iconv'
output_encoding = 'utf-8'
doc="Venerdì"
converted_doc = Iconv.new(output_encoding, 'latin1').iconv(doc)
rio("http://www.wikipedia.org/wiki/" + converted_doc)>rio("a.html")

I get an error message:

/usr/local/lib/ruby/1.8/uri/common.rb:436:in `split': bad URI(is not URI?): http://www.wikipedia.org/wiki/venerdì (URI::InvalidURIError)
from /usr/local/lib/ruby/1.8/uri/common.rb:485:in `parse'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/withpath.rb:285:in `uri_from_string_'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:74:in `arg0_info_'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:83:in `init_from_args_'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/uri.rb:56:in `initialize'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/base.rb:80:in `new'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/base.rb:80:in `parse'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/rl/builder.rb:111:in `build'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/factory.rb:412:in `create_state'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:65:in `initialize'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:76:in `new'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio.rb:76:in `rio'
from /usr/local/lib/ruby/gems/1.8/gems/rio-0.4.0/lib/rio/kernel.rb:42:in `rio'


This doesn't happen if I type in:

rio("http://www.wikipedia.org/wiki/Venerdì")>rio("a.html")

So I need to know what conversion arguments I need to give Iconv to
turn "Venerdì" into "Venerd%C3%AC".

Best regards,

Axel
 
A

Axel Etzold

I've managed to solve this problem like this:

require "rubygems"
require "rio"
require 'iconv'


def to_hex(number)
number=number.abs
binary=''
while number>0
digit=number%16
if digit<10
binary<<digit.to_s
elsif digit==10
binary<<'A%'
elsif digit==11
binary<<'B%'
elsif digit==12
binary<<'C%'
elsif digit==13
binary<<'D%'
elsif digit==14
binary<<'E%'
elsif digit==15
binary<<'F%'
end
number=(number-digit)/16
end
return binary.reverse.gsub(/%([A-F])%([A-F])/,'%\1\2')
end

class String
def wiki_addr
converted_doc = Iconv.new('utf-8', 'latin1').iconv(self)
res=''
converted_doc.split(//).each{|x|
if /[a-zA-Z0-9\_ ]/.match(x)
res<<x
else
res<<to_hex(x[0])
end
}
return res
end
end


doc ="venerdì"
doc.wiki_addr
rio("http://it.wikipedia.org/wiki/"+ doc.wiki_addr)>rio("a.html")

Best regards,

Axel
 
S

Stefan Rusterholz

Axel said:
I've managed to solve this problem like this:

require "rubygems"
require "rio"
require 'iconv'


def to_hex(number)
number=number.abs
binary=''
while number>0
digit=number%16
if digit<10
binary<<digit.to_s
elsif digit==10
...

I guess you're not aware of neither:
1234.to_s(16)
nor:
"%x" % 1234

For situations like the above, even a lookup-array or a case/when would
be better.

Regards
Stefan
 
A

Axel Etzold

Dear Stefan,

thank you for bringing this to notice!
(Slightly varying Voltaire, I might
have been able to write a shorter
program had I had more leisure and
more knowledge).
I'll try your suggestion.
Best regards,

Axel
 
N

Nobuyoshi Nakada

Hi,

At Sun, 10 Jun 2007 18:05:49 +0900,
Axel Etzold wrote in [ruby-talk:254981]:
I've managed to solve this problem like this:

$ ruby -riconv -rcgi -e 'puts CGI.escape(Iconv.conv("utf-8", "latin1", "venerd\354"))'
venerd%C3%AC
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top