Andrew said:
Hello,
I have an example program below that contains weird Icelandic
characters, and a copyright symbol, just for good measure. The code
expresses these as UTF8. They print exactly as you would want/expect
them to. So far so good. But what I want is to be able to go the other
way. I want to take a unicode string and recreate the escape sequences
for the funny international characters.For example, the single
character E-acute should be expanded to \u00C9 (6 characters). Any
ideas on how to do this please?
public class UTF8Test {
public UTF8Test() {
}
public String getString() {
StringBuilder builder = new StringBuilder();
builder.append("Copyright \u00A9 2009\n");
builder.append("Here is the phrase (in Icelandic): I can eat glass
and it doesn't hurt me\n");
builder.append("\u00C9g get eti\u00F0 gler \u00E1n \u00FEess a\u00F0
mei\u00F0a mig");
return builder.toString();
}
public static void main(String[] args) {
UTF8Test test = new UTF8Test();
System.out.println(test.getString());
}
}
FWIW, the reason I want to do this is I need to write strings like
this to a sybase table where the column is of type varchar. We cannot
make it univarchar (don't ask). So I need to be able to write unicode
characters without using unicode chars! I thought by having them in
this expanded form java can convert them just like the program above
does.
public class UTF8Test {
public UTF8Test() {
}
public String getString() {
StringBuilder builder = new StringBuilder();
builder.append("Copyright \u00A9 2009\n");
builder.append("Here is the phrase (in Icelandic): I can eat
glass and it doesn't hurt me\n");
builder.append("\u00C9g get eti\u00F0 gler \u00E1n \u00FEess
a\u00F0 mei\u00F0a mig");
return builder.toString();
}
public static void main(String[] args) {
UTF8Test test = new UTF8Test();
final String what=test.getString();
System.out.println(what);
for (int jj=0; jj<what.length(); ++jj) {
final char which=what.charAt(jj);
if (which=='\n') {
System.out.print("\\n");
} else if (which>=' ' && which<=0x7E) {
System.out.print(which);
} else {
System.out.printf("\\u%04X", (int)which);
}
}
System.out.println();
}
}
I think all the talk of UTF-8 and UTF-16 and encoding and system
properties is off the make. I think this is what you're looking for.
Copyright ? 2009
Here is the phrase (in Icelandic): I can eat glass and it doesn't hurt me
?g get eti? gler ?n ?ess a? mei?a mig
Copyright \u00A9 2009\u000AHere is the phrase (in Icelandic): I can eat
glass and it doesn't hurt me\n\u00C9g get eti\u00F0 gler \u00E1n
\u00FEess a\u00F0 mei\u00F0a mig