Encoding in file

L

Lukasz

Hi,

In my application I create some files and I write some text into. I
want to use UTF-8 encoding, but both methods that I tried seem to
ignore specified encoding. I used:

OutputStream fout= new FileOutputStream(nazwa);
OutputStream bout= new BufferedOutputStream(fout);
OutputStreamWriter out = new OutputStreamWriter(bout, "UTF8");

and

BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(nazwa),"UTF8"));

The problem seems to be simple, but often it is hard to find an answer
for a most simple question.
 
T

Thomas Kellerer

Lukasz wrote on 27.09.2006 10:24:
Hi,

In my application I create some files and I write some text into. I
want to use UTF-8 encoding, but both methods that I tried seem to
ignore specified encoding. I used:

Can you be more specific what you mean with "seem to ignore"?
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new
FileOutputStream(nazwa),"UTF8"));

This works for me, with the only difference that I use "UTF-8"

Thomas
 
L

Lukasz

Thomas Kellerer napisal(a):
Lukasz wrote on 27.09.2006 10:24:

Can you be more specific what you mean with "seem to ignore"?


This works for me, with the only difference that I use "UTF-8"

Thomas

In UTF-8, for example " sign should be replaced with ;quote (or
something like that). Neither of my method does it.
 
T

Thomas Kellerer

Lukasz wrote on 27.09.2006 11:44:
In UTF-8, for example " sign should be replaced with ;quote

Not at all!

What you are describing is HTML (or XML) "escaping".
That has nothing to do with the encoding of characters.

UTF-8 is an encoding that stores characters that do not fit into 8bit ASCII with
as a variable number of bytes. Some characters are encoded with one byte, some
with two, some with three.

The " sign fits into the 8bit ASCII range, and will be encoded with one byte
(hex 22)
The Euro symbol for example does not fit into the 8bit ASCII range, and will be
encoded with two bytes with UTF-8 (20 AC)

Thomas
 
L

Lukasz

Thomas Kellerer napisal(a):
Lukasz wrote on 27.09.2006 11:44:

Not at all!

What you are describing is HTML (or XML) "escaping".
That has nothing to do with the encoding of characters.

UTF-8 is an encoding that stores characters that do not fit into 8bit ASCII with
as a variable number of bytes. Some characters are encoded with one byte, some
with two, some with three.

The " sign fits into the 8bit ASCII range, and will be encoded with one byte
(hex 22)
The Euro symbol for example does not fit into the 8bit ASCII range, and will be
encoded with two bytes with UTF-8 (20 AC)

Thomas

And what should I make, to replace this " sign with :quote, as well as
other signs with xml escaping?
 
T

Thomas Kellerer

Lukasz wrote on 27.09.2006 12:12:
Thomas Kellerer napisal(a):

And what should I make, to replace this " sign with :quote, as well as
other signs with xml escaping?
There is not standard API (as far as I know). You'll have to roll your own. But
maybe the Jakarta site has something.

Thomas
 
S

Steve W. Jackson

Thomas Kellerer said:
Lukasz wrote on 27.09.2006 12:12:
There is not standard API (as far as I know). You'll have to roll
your own. But maybe the Jakarta site has something.

Thomas

If the information being written is actually XML, it should be a
non-issue. I've found that it's necessary to use the UTF-8 encoding
name on the OutputStreamWriter to ensure that the file itself gets that
encoding, but the method used to serialize the XML must also know that
it should use UTF-8 and it will automatically take care of this
"escaping".

= Steve =
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,740
Latest member
AdolphBig6

Latest Threads

Top