SaX,, Xerces: parse() and IOException caused by wrong URI-encoding ?

P

Pascal Lagass?

Hi,

Environment: Java 1.4.1_02
OS: Windows 2000
XML-Parse: Xerces-J 2.6.0

The following snipplet functions without problem with a filename, only
with standard ASCII characters (like: file.xml ...).

The XML processing job is done as it should be:

------------------------ SNIPPLET --------------------------
import org.apache.xerces.parsers.SAXParser;
....

public class mySaX {

private mySaxContentHandler m_wch;

public void parse(String uri) {
ContentHandler contentHandler = new mySaxContentHandler();

try {
XMLReader parser = new SAXParser();
parser.setContentHandler(contentHandler);
m_wch = (mySaxContentHandler) contentHandler;
parser.parse(uri);
...
} catch (IOException e) {
System.out.println("Fehler beim Lesen des URI: " + uri + " "+
e.getMessage());
} catch (SAXException e) {
System.out.println("Fehler beim Parsen: " + e.getMessage());
}
}
------------------------END SNIPPLET --------------------------

But, when I am using a filename with a name like "para_grisé.xml",
then I got:

------------------------- ERROR MESSAGE -------------------
Fehler beim Lesen des URI: para_grisÚ.xml unknown protocol: e
------------------------- END ERROR MESSAGE ---------------

I never encountered a problem with this before, when using a File
Object or when displaying the name with a SWING component.

Does anybody have encountered the same problem before? How should I
reencode the URI for the parsing?

Thank you very much in advance,

/Pascal Lagassé

Kösel GmbH & Co. KG - Über 400 Jahre Bücher mit System
Wartenseestraße 11 87435 Kempten
http://www.koeselbuch.de mailto:p[email protected]
 
D

Daniel

It is simple.
Use only a - z , A - Z , 1 - 0 and ".". No special charakter. All other
are not allowed.
 
P

Pascal Lagass?

It is simple.
Use only a - z , A - Z , 1 - 0 and ".". No special charakter. All other
are not allowed.

Thanks for your answer, but unfortunately, I do not have the control
over the filenames. Nevertheless, I have found the answer at:

www.w3.org/International/O-URL-code.html

where there is a java class (URLUTF8Encoder) to encode URL as UTF-8.

Input:
para_grisÚ.xml

After UTF8-Encoding :
para_gris%c3%a9.xml

Snipplet:
// ---------------------------------------------------------
File uriFile = new File (uri);
String encodedUri = uriFile.getParent() + File.separator +
URLUTF8Encoder.encode(uriFile.getName());
parser.parse(encodedUri);
// ---------------------------------------------------------

Best regards to you all, Java Developers!

/Pascal Lagassé

Kösel GmbH & Co. KG - Über 400 Jahre Bücher mit System
Wartenseestraße 11 87435 Kempten
http://www.koeselbuch.de mailto:p[email protected]
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,994
Messages
2,570,223
Members
46,813
Latest member
lawrwtwinkle111

Latest Threads

Top