... which I'm having trouble getting my head around.
I have a String, which is single character eg "a"
I need to convert it to a String which is the decimal representation of
the UTF8 ascii code ie "97"
What did you do to try to solve your problem?
As Mayeul pointed out, "UTF8 ascii code" [sic] doesn't mean anything.
ASCII is a code defining 128 entities, which are usually represented
each on 8 bits, with the most significant bit set to 0. But in any
case "ASCII the characters" should not be mistaken with "ASCII the
encoding".
Same for Unicode.
Unicode defines much more entities (called codepoints).
The 128 first Unicode entities are the 128 ASCII entities.
UTF-8 is an encoding that has been created so that any byte
with the most significant bit set to 0 is an ASCII entity.
So an UTF-8 encoded file containing only ASCII characters shall
be the same as an ASCII encoded file.
But in your case, if you have a String [sic] you shouldn't
care at all about encoding details: UTF-8 or little faeries
wearing boots drawing you characters using magical powder has
no importance.
Things get quickly messy in Java because when Java was created
Unicode didn't define codepoints outside the BMP. So we end
up with a backward compatible charAt(..) method that is broken
beyond repair because it definitely does NOT give back the
character at 'x' when you have a String that contains characters
outside the BMP.
All hope is not lost that said, for we now have the codePointAt(..)
method which works correctly for codepoints outside the BMP, as
shown in the example below:
@Test public void tests() {
assertEquals( Integer.toString("\u0000".codePointAt(0)),
"0" );
// Java offers no easy way to source code encode, say, U+1040B
(dec 66571)
assertEquals( Integer.toString("\uD801\uDC0B".codePointAt(0)),
"66571" ); // 0x1040B (hex) 66571 (dec)
assertEquals( Integer.toString("a".codePointAt(0)), "97" );
}
If you're curious as to how to do what Integer.toString(..) does
you can look at the source code for the Integer class.
Note that Integer.toString(int) works as expected on
entities outside the BMP:
Integer.toString("\uD801\uDC0B".codePointAt(0))
gives back the expected "66571" string.
By now you can expect the "JLS-nazi bot" (that shall recognize
itself) to nitpick on grammatical mistakes and claim loud
that Java is perfect and that the fact that we have both a
(broken) charAt(..) method and codePointAt(..) is not a
problem at all.
But as usual the "JLS-nazi bot"'s deranged ramblings shall be
sent to /dev/null without any consideration.