Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Java (http://www.velocityreviews.com/forums/f30-java.html)
-   -   How many illegal character for jdom? (http://www.velocityreviews.com/forums/t703239-how-many-illegal-character-for-jdom.html)

Carfield Yim 10-28-2009 03:46 PM

How many illegal character for jdom?
 
First I see exception message " is not legal for a JDOM character
content: 0x0 is not a legal XML character.", ok, then I trim all "\0"
character. Then, I get " is not legal for a JDOM character content:
0x1 is not a legal XML character." and " is not legal for a JDOM
character content: 0x2 is not a legal XML character.".

So.... how many illegal character for JDOM? Any easy way to parse all?

Mayeul 10-28-2009 04:34 PM

Re: How many illegal character for jdom?
 
Carfield Yim wrote:
> First I see exception message " is not legal for a JDOM character
> content: 0x0 is not a legal XML character.", ok, then I trim all "\0"
> character. Then, I get " is not legal for a JDOM character content:
> 0x1 is not a legal XML character." and " is not legal for a JDOM
> character content: 0x2 is not a legal XML character.".
>
> So.... how many illegal character for JDOM? Any easy way to parse all?


I am actually not sure, as I couldn't find any JDOM reference about it,
but I think it is safe to assume from the error messages, that any
illegal XML character is an illegal JDOM character.

U+0, U+1 and U+2 sure are illegal XML characters and it seems a good
idea for JDOM to reject them.

According to XML specifications:
(W3C server is overloaded again, check XML specification in Google, then
view the in-cache page)
http://209.85.229.132/search?q=cache...rg/TR/REC-xml/


The valid XML characters match this construction:

Character Range

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]
/* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */


It's up to you to count whatever isn't in this construction.

> Any easy way to parse all?


Not sure. Excluding surrogate blocks while keeping non-BMP characters
should be tricky with a regexp.

To be honest, I'm kinda wondering what you are trying to build a DOM
from. It's not everyday that I have to filter out illegal characters and
am disallowed to just discard the input as invalid.

--
Mayeul

Carfield Yim 10-28-2009 04:42 PM

Re: How many illegal character for jdom?
 
..
>
> To be honest, I'm kinda wondering what you are trying to build a DOM
> from. It's not everyday that I have to filter out illegal characters and
> am disallowed to just discard the input as invalid.


I cannot control my source so exactly Iwould like to discard those
characters from the input source...

Mayeul 10-28-2009 04:57 PM

Re: How many illegal character for jdom?
 
Carfield Yim wrote:
> .
>> To be honest, I'm kinda wondering what you are trying to build a DOM
>> from. It's not everyday that I have to filter out illegal characters and
>> am disallowed to just discard the input as invalid.

>
> I cannot control my source so exactly Iwould like to discard those
> characters from the input source...


I wish you lucks, then.

Not sure it helps, but Verifier.isXMLCharacter(int) from JDOM will check
a character is a valid XML character (this same method is called to
raise the error you got.)

Note it takes an int, not a char, as parameter. This is because it
handles non-BMP characters. You might want to do that too.

--
Mayeul

Carfield Yim 12-01-2009 03:20 AM

Re: How many illegal character for jdom?
 

> I wish you lucks, then.
>
> Not sure it helps, but Verifier.isXMLCharacter(int) from JDOM will check
> a character is a valid XML character (this same method is called to
> raise the error you got.)
>
> Note it takes an int, not a char, as parameter. This is because it
> handles non-BMP characters. You might want to do that too.
>
> --
> Mayeul


Fixed, actually I can reuse API from JDOM to check if character is
valid for XML document, or JDOM text, here is the code samples


final String tempText;
final StringBuilder content = new StringBuilder();
if (item instanceof FileItem)
tempText = HeadItem.extendedDesc((FileItem) item);
else
tempText = item.getDesc();

/* from JDOM library... */
/* 159 */int i = 0;
for (int len = tempText.length(); i < len; ++i)
/* */{
/* 161 */final char ch = tempText.charAt(i);
/* 164 */if (Verifier.isHighSurrogate(ch))
/* */{
/* 166 */++i;
/* 167 */if (i < len) {
/* 168 */char low = tempText.charAt(i);
/* 169 */if (!(Verifier.isLowSurrogate(low))) {
/* 170 */continue;
/* */}
/* */}
/* */else {
/* 177 */continue;
/* */}
/* */}
/* 181 */if (!(Verifier.isXMLCharacter(ch)))
/* */{
/* 185 */continue;
/* */}
/* */content.append(ch);
/* */}
/* */


All times are GMT. The time now is 04:07 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.