J
Jian Lin
I was trying to convert UTF-8 content into a series of entities like
剛 so that whatever the page encoding is, the characters would
show...
so I used something like this:
<%
begin
t = ''
s = Iconv.conv("UTF-32", "UTF-8", some_utf8_string)
s.scan(/(.)(.)(.)(.)/) do |b1, b2, b3, b4|
t += ("&#x" + "%02X" % b3.ord) + ("%02X" % b4.ord) + ";"
end
rescue => details
t = "exception " + details
end
%>
<%= t %>
but some characters get converted, and some don't. Is it true that
(.)(.)(.)(.) will not necessarily match 4 bytes at a time?
At first, I was going to use
s = Iconv.conv("UTF-16", "UTF-8", some_utf8_string)
but then i found that utf-16 is also variable length... so I used UTF-32
instead which is fixed length. The UTF-8 string I have is just the
Basic Plane... so should be all in the 0x0000 to 0xFFFF range in
unicode.
剛 so that whatever the page encoding is, the characters would
show...
so I used something like this:
<%
begin
t = ''
s = Iconv.conv("UTF-32", "UTF-8", some_utf8_string)
s.scan(/(.)(.)(.)(.)/) do |b1, b2, b3, b4|
t += ("&#x" + "%02X" % b3.ord) + ("%02X" % b4.ord) + ";"
end
rescue => details
t = "exception " + details
end
%>
<%= t %>
but some characters get converted, and some don't. Is it true that
(.)(.)(.)(.) will not necessarily match 4 bytes at a time?
At first, I was going to use
s = Iconv.conv("UTF-16", "UTF-8", some_utf8_string)
but then i found that utf-16 is also variable length... so I used UTF-32
instead which is fixed length. The UTF-8 string I have is just the
Basic Plane... so should be all in the 0x0000 to 0xFFFF range in
unicode.