Error (?) writing foreign-language (French/Japanese/..) string from Java program to a file



Hi all,

I am trying to write a text file from a Java app, so the
file can be correctly viewed by external browsers
(e.g.: IE, EditPlus... ).

I encode the content of the file to UTF-8 and write
it the standard Java way (see code below).

I then open the file with IE, EditPlus...

If the string was in English, it is correctly
displayed - no problems.

If the string was in French/Arabic/Japanese:
I get garbage: '???????'.

Same happens when I switch to
using UTF-16 encoding.

In both case, when I read the output file back into the
original Java app, I find that its content (after decoding)
equals the original string.

I take it to mean that the encryption/decryption
process was successful.

What am I doing wrong?


The code:

public void encrypt_decrypt_a_string() throws IOException
Charset utf8_cs = Charset.forName("UTF-8"); // or ("UTF-16");
String FILE_PATH = "/tmp/test-file.htm" ;
String STR = "some french or Japanese text here...";

// write to file
OutputStreamWriter osw = new OutputStreamWriter(
new FileOutputStream(FILE_PATH), utf8_cs);
osw.write( STR);

// read from file
InputStreamReader reader = new InputStreamReader(
new FileInputStream(FILE_PATH), utf8_cs);
BufferedReader br = new BufferedReader(reader);
String string = br.readLine();

// verify enc/dec succeeded
if( !string.equals(STR))
throw new RuntimeException( "enc/dec failure.." );

System.out.println("enc/dec success");



John C. Bollinger

qqq111 said:
I am trying to write a text file from a Java app, so the
file can be correctly viewed by external browsers
(e.g.: IE, EditPlus... ).

I encode the content of the file to UTF-8 and write
it the standard Java way (see code below).

I then open the file with IE, EditPlus...

If the string was in English, it is correctly
displayed - no problems.

If the string was in French/Arabic/Japanese:
I get garbage: '???????'.

Same happens when I switch to
using UTF-16 encoding.

In both case, when I read the output file back into the
original Java app, I find that its content (after decoding)
equals the original string.

I take it to mean that the encryption/decryption
process was successful.

What am I doing wrong?

It is possible that your editor / browser is not detecting the encoding
correctly. If UTF-8 is not the system's default encoding (a likely
scenario) then that wouldn't be very surprising.

As another poster noted, there is also a potential issue with the system
/ application using a font that contains glyphs for the characters in
question. Though that might present a problem for Arabic or Japanese
text, it oughtn't to be an issue for French, or any other western
European language.


Issue solved.

The problem was due to us inserting the foreign string using
our IDE editor (IDEA's IntelliJ). In it, the string _appeared_ to be
correct, but it was not. It took me walking through its char
content with a debugger to realize that.

Problem vanished once we switched to reading foreign
strings from an external resource file (written in notepad).

Lesson learned: things are not always the way they seem...

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Latest member

Latest Threads
