Error (?) writing foreign-language (French/Japanese/..) string from Java program to a file

Q

qqq111

Hi all,

I am trying to write a text file from a Java app, so the
file can be correctly viewed by external browsers
(e.g.: IE, EditPlus... ).

I encode the content of the file to UTF-8 and write
it the standard Java way (see code below).

I then open the file with IE, EditPlus...

If the string was in English, it is correctly
displayed - no problems.

If the string was in French/Arabic/Japanese:
I get garbage: '???????'.

Same happens when I switch to
using UTF-16 encoding.


In both case, when I read the output file back into the
original Java app, I find that its content (after decoding)
equals the original string.

I take it to mean that the encryption/decryption
process was successful.


What am I doing wrong?

Thanks,
Gilad




The code:

public void encrypt_decrypt_a_string() throws IOException
{
Charset utf8_cs = Charset.forName("UTF-8"); // or ("UTF-16");
String FILE_PATH = "/tmp/test-file.htm" ;
String STR = "some french or Japanese text here...";


// write to file
//
OutputStreamWriter osw = new OutputStreamWriter(
new FileOutputStream(FILE_PATH), utf8_cs);
osw.write( STR);
osw.close();


// read from file
//
InputStreamReader reader = new InputStreamReader(
new FileInputStream(FILE_PATH), utf8_cs);
BufferedReader br = new BufferedReader(reader);
String string = br.readLine();


// verify enc/dec succeeded
//
if( !string.equals(STR))
{
throw new RuntimeException( "enc/dec failure.." );
}

System.out.println("enc/dec success");

}

//EOF
 
J

John C. Bollinger

qqq111 said:
I am trying to write a text file from a Java app, so the
file can be correctly viewed by external browsers
(e.g.: IE, EditPlus... ).

I encode the content of the file to UTF-8 and write
it the standard Java way (see code below).

I then open the file with IE, EditPlus...

If the string was in English, it is correctly
displayed - no problems.

If the string was in French/Arabic/Japanese:
I get garbage: '???????'.

Same happens when I switch to
using UTF-16 encoding.


In both case, when I read the output file back into the
original Java app, I find that its content (after decoding)
equals the original string.

I take it to mean that the encryption/decryption
process was successful.


What am I doing wrong?

It is possible that your editor / browser is not detecting the encoding
correctly. If UTF-8 is not the system's default encoding (a likely
scenario) then that wouldn't be very surprising.

As another poster noted, there is also a potential issue with the system
/ application using a font that contains glyphs for the characters in
question. Though that might present a problem for Arabic or Japanese
text, it oughtn't to be an issue for French, or any other western
European language.
 
Q

qqq111

Issue solved.

The problem was due to us inserting the foreign string using
our IDE editor (IDEA's IntelliJ). In it, the string _appeared_ to be
correct, but it was not. It took me walking through its char
content with a debugger to realize that.

Problem vanished once we switched to reading foreign
strings from an external resource file (written in notepad).

Lesson learned: things are not always the way they seem...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top