Peculiar issue with French characters

S

sumitra

Hello All,

I need to print out French characters
(ççÇÇààÀÀèèÈÈééÉÉ) in a PDF file by runningmy code on
Unix. I'm using iText to create the PDF. The configurations in iText
for the fonts include BaseFont.IDENTITY_H for encoding and
BaseFont.EMBEDDED.

The PDF encoding I have given is:
/BaseFont /Courier /Encoding /WinAnsiEncoding

which generates the PDFs with the French text fine on Windows. Should I
be changing this??

The problem is that with these parameters, on Unix, all I get is
garbled text in my pdf doc.

Compiling with -encoding ISO-8859-1 does not help because these French
values are picked up at run time from a Hashtable. I have checked the
Hashtable contents and they look good.

My code uses a lot of StringWriter() and I would like to know if I need
to explicitly set the encoding here to "8859_1" and if so, how?? I've
tried the ByteArrayOutputStream approach to replace the StringWriter
and wrapped that in OutputStreamWriter with the ecoding 8859_1. That
did not help.

I also tried the getBytes() method of StringWriter and tried to convert
it to another encoding, but that did not help too!!

I really am at a loss now as to how to resolve my problem.
If anyone out there has an idea do let me know please!
Thanks in advance.

--Sum
 
T

Thomas Hawtin

My code uses a lot of StringWriter() and I would like to know if I need
to explicitly set the encoding here to "8859_1" and if so, how?? I've
tried the ByteArrayOutputStream approach to replace the StringWriter
and wrapped that in OutputStreamWriter with the ecoding 8859_1. That
did not help.

I also tried the getBytes() method of StringWriter and tried to convert
it to another encoding, but that did not help too!!

Character encoding matters at the point you encode characters as bytes
(or the opposite decode).

Lots of APIs confuse the matter by picking the encoding up from the
system defaults. So code may work on one setup, but not on another. To
get around a fatal bug in Adobe Acrobat Reader I had to change
encodings, meaning I could get different results depending upon which
window/tab I launched an application from.

FileWriter doesn't support character encodings, so don't use that class.
OutputStreamWriter has constructors to take character encodings, and one
which doesn't (so don't use that one). StringWriter.getBytes does not
exist. Swing has various methods which may depend upon configured
encoding, a specified encoding or just chopping the top byte off each
character (including surrogates).

Tom Hawtin
 
S

Sum

My bad, I meant the String.getBytes() method and not
StringWriter.getBytes(), which as you rightly pointed out, does not
exist.

What I noticed while running my app on Unix was that the French string
being returned to my program was:

ççÃÃà à ÃÃèèÃÃééÃÃ

whereas I expected to see:

ççÇÇààÀÀèèÈÈééÉÉ

This does not happen on Windows. Also, I actually compile my code on
Windows, and put the tarball onto Unix.
What do you suppose is happening now??
 
S

sumitra

Figured it out. The one thing that I did not do was to start the
application (in Unix) from the same session where I had set LANG to
fr_FR. I assumed that setting LANG=fr_FR would have an environment
level effect, however that turned out to be only for that telnet
session!

Thanks for the help everyone. :-D
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,996
Messages
2,570,238
Members
46,826
Latest member
robinsontor

Latest Threads

Top