Detecting the codepage of a file beeing uploaded ?

M

Mehmet Gunacti

Hello,
We get xml files uploaded from users of our web application written in
Java. We published an XSD file, so the xml files we get are well
formed. But some users generate the xml files under DOS and use CP857
codepage, that includes Turkish characters.
After we receive an xml file we don't save it to disk, instead we
process the data and save to a database. But the Turkish characters
are corrupted because of the "wrong" characterset of the xml file,
although the first tag of the xml file is :
<?xml version="1.0" encoding="iSO-8859-9"?> the Turkish characters it
contains aren't saved correctly to database.

If there would be some method like getEncoding() which returns "cp857"
we would tell the user to generate the file under Windows. But after
researching for days now we couldn't find any usefull API. We get only
CP1254 for all kind of xml files generated under DOS or Windows. And
that doesn't solve our problem.

How can we detect the characterset of the incoming file ?

Thanks in advance
Mehmet Gunacti

PS: We use Java 1.4 under Windows OS.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,820
Latest member
GilbertoA5

Latest Threads

Top