![]() |
How many illegal character for jdom?
First I see exception message " is not legal for a JDOM character
content: 0x0 is not a legal XML character.", ok, then I trim all "\0" character. Then, I get " is not legal for a JDOM character content: 0x1 is not a legal XML character." and " is not legal for a JDOM character content: 0x2 is not a legal XML character.". So.... how many illegal character for JDOM? Any easy way to parse all? |
Re: How many illegal character for jdom?
Carfield Yim wrote:
> First I see exception message " is not legal for a JDOM character > content: 0x0 is not a legal XML character.", ok, then I trim all "\0" > character. Then, I get " is not legal for a JDOM character content: > 0x1 is not a legal XML character." and " is not legal for a JDOM > character content: 0x2 is not a legal XML character.". > > So.... how many illegal character for JDOM? Any easy way to parse all? I am actually not sure, as I couldn't find any JDOM reference about it, but I think it is safe to assume from the error messages, that any illegal XML character is an illegal JDOM character. U+0, U+1 and U+2 sure are illegal XML characters and it seems a good idea for JDOM to reject them. According to XML specifications: (W3C server is overloaded again, check XML specification in Google, then view the in-cache page) http://209.85.229.132/search?q=cache...rg/TR/REC-xml/ The valid XML characters match this construction: Character Range Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */ It's up to you to count whatever isn't in this construction. > Any easy way to parse all? Not sure. Excluding surrogate blocks while keeping non-BMP characters should be tricky with a regexp. To be honest, I'm kinda wondering what you are trying to build a DOM from. It's not everyday that I have to filter out illegal characters and am disallowed to just discard the input as invalid. -- Mayeul |
Re: How many illegal character for jdom?
..
> > To be honest, I'm kinda wondering what you are trying to build a DOM > from. It's not everyday that I have to filter out illegal characters and > am disallowed to just discard the input as invalid. I cannot control my source so exactly Iwould like to discard those characters from the input source... |
Re: How many illegal character for jdom?
Carfield Yim wrote:
> . >> To be honest, I'm kinda wondering what you are trying to build a DOM >> from. It's not everyday that I have to filter out illegal characters and >> am disallowed to just discard the input as invalid. > > I cannot control my source so exactly Iwould like to discard those > characters from the input source... I wish you lucks, then. Not sure it helps, but Verifier.isXMLCharacter(int) from JDOM will check a character is a valid XML character (this same method is called to raise the error you got.) Note it takes an int, not a char, as parameter. This is because it handles non-BMP characters. You might want to do that too. -- Mayeul |
Re: How many illegal character for jdom?
> I wish you lucks, then. > > Not sure it helps, but Verifier.isXMLCharacter(int) from JDOM will check > a character is a valid XML character (this same method is called to > raise the error you got.) > > Note it takes an int, not a char, as parameter. This is because it > handles non-BMP characters. You might want to do that too. > > -- > Mayeul Fixed, actually I can reuse API from JDOM to check if character is valid for XML document, or JDOM text, here is the code samples final String tempText; final StringBuilder content = new StringBuilder(); if (item instanceof FileItem) tempText = HeadItem.extendedDesc((FileItem) item); else tempText = item.getDesc(); /* from JDOM library... */ /* 159 */int i = 0; for (int len = tempText.length(); i < len; ++i) /* */{ /* 161 */final char ch = tempText.charAt(i); /* 164 */if (Verifier.isHighSurrogate(ch)) /* */{ /* 166 */++i; /* 167 */if (i < len) { /* 168 */char low = tempText.charAt(i); /* 169 */if (!(Verifier.isLowSurrogate(low))) { /* 170 */continue; /* */} /* */} /* */else { /* 177 */continue; /* */} /* */} /* 181 */if (!(Verifier.isXMLCharacter(ch))) /* */{ /* 185 */continue; /* */} /* */content.append(ch); /* */} /* */ |
| All times are GMT. The time now is 10:46 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.