P
Pimousse
Hi everybody,
I'm helping a friend with a parsing problem using JDom. As we're latin
people , we have in our xml files characters like "é" or "à".
That far, no problem.
But we have to modifiing XML files using alphabets that don't support
these characters, such as UTF-8 (but non only this one). In fact, our
company re-used xml files previously developped by another company ("not
latin"). Inserting data was not a problem, but today modifiing isn't so
easy. And with this configuration, JDom throws exception, even if we add
these lines :
Format format=Format.getPrettyFormat();
format.setEncoding("iso-8859-1");
So we're not able to generate a DOM document ! And so we can't modify
our documents !
Then we decided to modify the line :
<?xml version="1.0" encoding="utf-8"?> (for example)
by something like :
<?xml version="1.0" encoding="iso-8859-1"?>
But as we can't know before reading the file the alphabet type, we
decided to use a regular expression.
As I'm more skilled in PHP than in Java, I developped that pattern in
PHP (tested and working) :
(<\?xml[^>]+encoding=\")([^>]+)(\"?[^>]+\?>)
that should be replaced by :
\\1iso-8859-1\\3
But I don't succeed in translating it in Java.
Using that syntax :
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(string);
string = m.replaceAll(replace);
where
pattern = "(<\\?xml[^>]+encoding=\")([^>]+)(\"?[^>]+\\?>)";
replace = "\\1iso-8859-1\\3";
does not work ...
Can someone help me to translate my pattern from a PHP syntax to a Java
syntax ?
Thanks.
Ps : I already read
http://java.sun.com/docs/books/tutorial/extra/regex/index.html ....
I'm helping a friend with a parsing problem using JDom. As we're latin
people , we have in our xml files characters like "é" or "à".
That far, no problem.
But we have to modifiing XML files using alphabets that don't support
these characters, such as UTF-8 (but non only this one). In fact, our
company re-used xml files previously developped by another company ("not
latin"). Inserting data was not a problem, but today modifiing isn't so
easy. And with this configuration, JDom throws exception, even if we add
these lines :
Format format=Format.getPrettyFormat();
format.setEncoding("iso-8859-1");
So we're not able to generate a DOM document ! And so we can't modify
our documents !
Then we decided to modify the line :
<?xml version="1.0" encoding="utf-8"?> (for example)
by something like :
<?xml version="1.0" encoding="iso-8859-1"?>
But as we can't know before reading the file the alphabet type, we
decided to use a regular expression.
As I'm more skilled in PHP than in Java, I developped that pattern in
PHP (tested and working) :
(<\?xml[^>]+encoding=\")([^>]+)(\"?[^>]+\?>)
that should be replaced by :
\\1iso-8859-1\\3
But I don't succeed in translating it in Java.
Using that syntax :
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(string);
string = m.replaceAll(replace);
where
pattern = "(<\\?xml[^>]+encoding=\")([^>]+)(\"?[^>]+\\?>)";
replace = "\\1iso-8859-1\\3";
does not work ...
Can someone help me to translate my pattern from a PHP syntax to a Java
syntax ?
Thanks.
Ps : I already read
http://java.sun.com/docs/books/tutorial/extra/regex/index.html ....