How many illegal character for jdom?

C

Carfield Yim

First I see exception message " is not legal for a JDOM character
content: 0x0 is not a legal XML character.", ok, then I trim all "\0"
character. Then, I get " is not legal for a JDOM character content:
0x1 is not a legal XML character." and " is not legal for a JDOM
character content: 0x2 is not a legal XML character.".

So.... how many illegal character for JDOM? Any easy way to parse all?
 
M

Mayeul

Carfield said:
First I see exception message " is not legal for a JDOM character
content: 0x0 is not a legal XML character.", ok, then I trim all "\0"
character. Then, I get " is not legal for a JDOM character content:
0x1 is not a legal XML character." and " is not legal for a JDOM
character content: 0x2 is not a legal XML character.".

So.... how many illegal character for JDOM? Any easy way to parse all?

I am actually not sure, as I couldn't find any JDOM reference about it,
but I think it is safe to assume from the error messages, that any
illegal XML character is an illegal JDOM character.

U+0, U+1 and U+2 sure are illegal XML characters and it seems a good
idea for JDOM to reject them.

According to XML specifications:
(W3C server is overloaded again, check XML specification in Google, then
view the in-cache page)
http://209.85.229.132/search?q=cache:fdujgnyF_v4J:www.w3.org/TR/REC-xml/


The valid XML characters match this construction:

Character Range

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */


It's up to you to count whatever isn't in this construction.
> Any easy way to parse all?

Not sure. Excluding surrogate blocks while keeping non-BMP characters
should be tricky with a regexp.

To be honest, I'm kinda wondering what you are trying to build a DOM
from. It's not everyday that I have to filter out illegal characters and
am disallowed to just discard the input as invalid.
 
C

Carfield Yim

..
To be honest, I'm kinda wondering what you are trying to build a DOM
from. It's not everyday that I have to filter out illegal characters and
am disallowed to just discard the input as invalid.

I cannot control my source so exactly Iwould like to discard those
characters from the input source...
 
M

Mayeul

Carfield said:
.

I cannot control my source so exactly Iwould like to discard those
characters from the input source...

I wish you lucks, then.

Not sure it helps, but Verifier.isXMLCharacter(int) from JDOM will check
a character is a valid XML character (this same method is called to
raise the error you got.)

Note it takes an int, not a char, as parameter. This is because it
handles non-BMP characters. You might want to do that too.
 
C

Carfield Yim

I wish you lucks, then.

Not sure it helps, but Verifier.isXMLCharacter(int) from JDOM will check
a character is a valid XML character (this same method is called to
raise the error you got.)

Note it takes an int, not a char, as parameter. This is because it
handles non-BMP characters. You might want to do that too.

Fixed, actually I can reuse API from JDOM to check if character is
valid for XML document, or JDOM text, here is the code samples


final String tempText;
final StringBuilder content = new StringBuilder();
if (item instanceof FileItem)
tempText = HeadItem.extendedDesc((FileItem) item);
else
tempText = item.getDesc();

/* from JDOM library... */
/* 159 */int i = 0;
for (int len = tempText.length(); i < len; ++i)
/* */{
/* 161 */final char ch = tempText.charAt(i);
/* 164 */if (Verifier.isHighSurrogate(ch))
/* */{
/* 166 */++i;
/* 167 */if (i < len) {
/* 168 */char low = tempText.charAt(i);
/* 169 */if (!(Verifier.isLowSurrogate(low))) {
/* 170 */continue;
/* */}
/* */}
/* */else {
/* 177 */continue;
/* */}
/* */}
/* 181 */if (!(Verifier.isXMLCharacter(ch)))
/* */{
/* 185 */continue;
/* */}
/* */content.append(ch);
/* */}
/* */
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,240
Members
46,828
Latest member
LauraCastr

Latest Threads

Top