UTF-8 encoding

N

Nishi Bhonsle

Hi:
In an servlet application, I need to pass a UTF-8 encoded writer to an Java API, which will process the contents of a file through the writer. Thereafter, the file can be saved on the users machine through a OS specific "File Save As" dialog box. The UTF-8 encoding takes into account non-ascii data(users in non-english locale).

I noticed that this works fine on IE as well as Netscape for English locale but for non-english locale, IE does not pop up the FileSave As box but displays the contents of the file in a same browser window whereas Netscape saves the file as a 0 byte file.
Can someone please let me know what could be wrong with the below code?

//this page has the "download" property set, so the temp.txt will be saved on the users machine by providing the user with a File //SaveAs dialog box.
try {
java.io.Writer utf8Writer = new OutputStreamWriter(new FileOutputStream("temp.txt",false), "UTF-8");

<APIname>(utf8Writer); //API call
utf8Writer.flush();

java.io.InputStream is = new BufferedInputStream(new FileInputStream("temp.txt"));
BufferedReader in = new BufferedReader(new InputStreamReader(is));

//Make sure that the contents get saved in readable format
String inputLine;
String newLine = " ";
String newline = System.getProperty("line.separator");

while ((inputLine = in.readLine()) != null)
{
newLine=newLine.concat(inputLine);
newLine=newLine.concat(newline);

}
StringBufferInputStream sis = new StringBufferInputStream(newLine);
//set the stream for download to be sis
//download the temp.txt through a download bean
is.close();
utf8Writer.close();
}
catch (Throwable t)
{
handleError(t);
}
}
 
C

Chris Smith

Nishi,

See below for an answer.

As a side note, would it be possible to convince you to choose a
reasonable line wrap size in the future? It's a real pain to reformat
all the quoting when responding to your message.

Nishi said:
In an servlet application, I need to pass a UTF-8 encoded writer
to an Java API, which will process the contents of a file through
the writer. Thereafter, the file can be saved on the users machine
through a OS specific "File Save As" dialog box. The UTF-8 encoding
takes into account non-ascii data(users in non-english locale).

I noticed that this works fine on IE as well as Netscape for English
locale but for non-english locale, IE does not pop up the FileSaveAs
box but displays the contents of the file in a same browser window
whereas Netscape saves the file as a 0 byte file.

Can someone please let me know what could be wrong with the below
code?

Sure. Here are a few comments.

First, you are writing a file in UTF-8 encoding, then turning around and
reading that file with the system's default encoding. That has
undefined results, and will only do something sensible if the system
default encoding happens to be UTF-8. If you know that you've written
the file in UTF-8, you should create an InputStreamReader using the UTF-
8 encoding explicitly, and use that to read the file back again.

Second, the code you posted doesn't compile. There's some confusion
there where newLine is sometimes treated as a String (and declared as a
String), but used elsewhere as if it were a StringBuffer. I'm going to
assume that this is related to copying the code into your newsreader.
Copy/paste works great for that, and saves you from these kinds of
unintentional mistakes.

Third, you completely destroy any shot of writing working code when you
use the StringBufferInputStream class. There's a very good reason that
it's deprecated.

Fourth, why on earth do you have one variable called 'newline' and
another called 'newLine'. You'd have to try really hard to come up with
something so bug-prone as that.


If you can clarify what you mean to accomplish by everything past where
you create the BufferedInputStream, perhaps I can help more. Looks to
me like the only purpose of any code past this:
//this page has the "download" property set, so the temp.txt will be saved on the users machine by providing the user with a File //SaveAs dialog box.
try {
java.io.Writer utf8Writer = new OutputStreamWriter(new FileOutputStream("temp.txt",false), "UTF-8");

<APIname>(utf8Writer); //API call
utf8Writer.flush();

java.io.InputStream is = new BufferedInputStream(new FileInputStream("temp.txt"));

.... is to break things. Just do the above, and as far as I can tell you
are done.

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
N

Nishi Bhonsle

Chris:

Chris said:
Nishi,

See below for an answer.

As a side note, would it be possible to convince you to choose a
reasonable line wrap size in the future? It's a real pain to reformat
all the quoting when responding to your message.



Sure. Here are a few comments.

First, you are writing a file in UTF-8 encoding, then turning around and
reading that file with the system's default encoding. That has
undefined results, and will only do something sensible if the system
default encoding happens to be UTF-8. If you know that you've written
the file in UTF-8, you should create an InputStreamReader using the UTF-
8 encoding explicitly, and use that to read the file back again.

I changed that but it still does not work.
Second, the code you posted doesn't compile. There's some confusion
there where newLine is sometimes treated as a String (and declared as a
String), but used elsewhere as if it were a StringBuffer. I'm going to
assume that this is related to copying the code into your newsreader.
Copy/paste works great for that, and saves you from these kinds of
unintentional mistakes.

This is required because I noticed that the file that gets saved on the system contains data/text as one single string. So I had to read the contents
of the buffer line by line separatoring it by line separator.
Third, you completely destroy any shot of writing working code when you
use the StringBufferInputStream class. There's a very good reason that
it's deprecated.

Since I was reading the buffer into the above mentioned string, I have to use a StringBufferInputStream class which accepts a String. Moreover I
have a downloadBean defined for that page that works only on streams of data and not readers, hence I cannot use StringReader in placeof
StringBufferInputStream. Can you suggest anything?
Fourth, why on earth do you have one variable called 'newline' and
another called 'newLine'. You'd have to try really hard to come up with
something so bug-prone as that.

I changed this.
If you can clarify what you mean to accomplish by everything past where
you create the BufferedInputStream, perhaps I can help more. Looks to
me like the only purpose of any code past this:

I still see the following for non-ascii data:
IE -- part of the data is displayed in the same browser window, no File Save As downloadbox is shown
Netscape 7.1 -- the File Save As box is seen and it saves the file, but this operation goes into a loop and hence the successive files that get
downloaded are of O KB.

try {

java.io.Writer utf8Writer = new OutputStreamWriter(new FileOutputStream("temp.txt",false), "UTF-8");

<APIname>(utf8Writer);
utf8Writer.flush();

java.io.InputStream is = new BufferedInputStream(new FileInputStream("temp.txt"));
BufferedReader in = new BufferedReader(new InputStreamReader(is, "UTF-8"));


String fileContents;
String newLine = " ";
String lineSeparator = System.getProperty("line.separator");

while ((fileContents = in.readLine()) != null)
{
newLine=newLine.concat(fileContents);
newLine=newLine.concat(lineSeparator);

}
is.close();
in.close();

StringBufferInputStream sis = new StringBufferInputStream(newLine);
//StringReader sis = new StringReader(newLine);

nextPage.setBooleanProperty("download",true);
//sis.close();

//is.close();
utf8Writer.close();
setNextPage(nextPage);
DownloadBean dataBean = new DownloadBean();
dataBean.setStream(sis);
//dataBean.setReader(sis);
dataBean.setFileName("download.txt");
dataBean.setSize(newLine.length());
dataBean.setSuccessMessage(rb.getString("EXPORT_SUCCESS"));
dataBean.setStatus(dataBean._SUCCESS);
//setRedirectPage(nextPage);
setDataBean(dataBean);
setStatus(EventDone);
}
catch (Throwable t)
{
setDataBean(null);
handleError(t);
}
}
 
S

Steve Horsley

Nishi said:
Hi:
In an servlet application, I need to pass a UTF-8 encoded writer to an
Java API, which will process the contents of a file through the writer.
Thereafter, the file can be saved on the users machine through a OS
specific "File Save As" dialog box. The UTF-8 encoding takes into
account non-ascii data(users in non-english locale).

I noticed that this works fine on IE as well as Netscape for English
locale but for non-english locale, IE does not pop up the FileSave As
box but displays the contents of the file in a same browser window
whereas Netscape saves the file as a 0 byte file.
Can someone please let me know what could be wrong with the below code?

I have a feeling this may be to do with the mime file type that the server
says the file is - in the header of the reply to the GET request.
I think the decision to offer a "File Save As" dialog
is made before the browser ever attempts to read the file contents, in which
case the actual encoding you use is irrelevant to your current problem.

I could be completely wrong though.

Steve
 
C

Chris Smith

Again,

Nishi said:
I changed that but it still does not work.

Yep. That was one of several problems in your code. You'll need to fix
the others.

[newline stuff]

Nishi said:
This is required because I noticed that the file that gets saved on the
system contains data/text as one single string. So I had to read the
contents of the buffer line by line separatoring it by line separator.

Okay, that makes more sense, then. You'll just need to fix this so
you're doing it right. Still, it appears that there's some confusion
between String and StringBuffer in your code.
Since I was reading the buffer into the above mentioned string, I have
to use a StringBufferInputStream class which accepts a String. Moreover
I have a downloadBean defined for that page that works only on streams
of data and not readers, hence I cannot use StringReader in placeof
StringBufferInputStream. Can you suggest anything?

The fact remains that StringBufferInputStream is broken. You can't use
it if you want your code to have any shot at working with anything but
ASCII or ISO8859-1 encodings.

As for how to read a String as an InputStream, you need to first choose
an encoding. You've been using UTF-8, and there's no reason to switch
now. Next, you need to convert the String to a byte sequence using that
encoding (String.getBytes(String enc) works if you expect the result to
fit in memory; otherwise, you can write the String to an
OutputStreamWriter -- and specify the encoding in the constructor --
wrapping a FileOutputStream for a temporary file). Finally, you can
open an InputStream for that byte sequence (ByteArrayInputStream if it's
in memory, or FileInputStream if you wrote to a temporary file on disk).

--
www.designacourse.com
The Easiest Way to Train Anyone... Anywhere.

Chris Smith - Lead Software Developer/Technical Trainer
MindIQ Corporation
 
I

Icemerth

Steve Horsley said:
I have a feeling this may be to do with the mime file type that the server
says the file is - in the header of the reply to the GET request.
I think the decision to offer a "File Save As" dialog
is made before the browser ever attempts to read the file contents, in which
case the actual encoding you use is irrelevant to your current problem.

I could be completely wrong though.

Steve
ddd
 
I

Icemerth

Steve Horsley said:
I have a feeling this may be to do with the mime file type that the server
says the file is - in the header of the reply to the GET request.
I think the decision to offer a "File Save As" dialog
is made before the browser ever attempts to read the file contents, in which
case the actual encoding you use is irrelevant to your current problem.

I could be completely wrong though.

Steve

ddd
 
I

Icemerth

Steve Horsley said:
I have a feeling this may be to do with the mime file type that the server
says the file is - in the header of the reply to the GET request.
I think the decision to offer a "File Save As" dialog
is made before the browser ever attempts to read the file contents, in which
case the actual encoding you use is irrelevant to your current problem.

I could be completely wrong though.

Steve

ddd
 
I

Icemerth

Steve Horsley said:
I have a feeling this may be to do with the mime file type that the server
says the file is - in the header of the reply to the GET request.
I think the decision to offer a "File Save As" dialog
is made before the browser ever attempts to read the file contents, in which
case the actual encoding you use is irrelevant to your current problem.

I could be completely wrong though.

Steve

ddd
 
I

Icemerth

Steve Horsley said:
I have a feeling this may be to do with the mime file type that the server
says the file is - in the header of the reply to the GET request.
I think the decision to offer a "File Save As" dialog
is made before the browser ever attempts to read the file contents, in which
case the actual encoding you use is irrelevant to your current problem.

I could be completely wrong though.

Steve

ddd
 
I

Icemerth

Steve Horsley said:
I have a feeling this may be to do with the mime file type that the server
says the file is - in the header of the reply to the GET request.
I think the decision to offer a "File Save As" dialog
is made before the browser ever attempts to read the file contents, in which
case the actual encoding you use is irrelevant to your current problem.

I could be completely wrong though.

Steve

ddd
 
I

Icemerth

Steve Horsley said:
I have a feeling this may be to do with the mime file type that the server
says the file is - in the header of the reply to the GET request.
I think the decision to offer a "File Save As" dialog
is made before the browser ever attempts to read the file contents, in which
case the actual encoding you use is irrelevant to your current problem.

I could be completely wrong though.

Steve
ddd
 
R

Roedy Green


You have done this repeatedly. Why? are you just trying to annoy
people for some reason? Has it some deep inner meaning that escapes
me. Is this the way you trigger a terrorist bomb in Washington?
 
S

Sudsy

Roedy said:
You have done this repeatedly. Why? are you just trying to annoy
people for some reason? Has it some deep inner meaning that escapes
me. Is this the way you trigger a terrorist bomb in Washington?

I have now reported him twice to his ISP. Didn't we get something like
8 reply posts from this address yesterday?
It's one thing to misunderstand the posting mechanism, another thing
entirely when you repeat your mistake eight times! ... ;-)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top