Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Incorrect parsing of special characters

Reply
Thread Tools

Incorrect parsing of special characters

 
 
Dario Di Bella
Guest
Posts: n/a
 
      06-17-2004
Hi all,
I hope someone can help me on this. I need to parse the following XML:

....
<area name="promotore">
<item id="004" code="003" description="attivita promotore">
<![CDATA[&nbsp;Attivit&nbsp;Promotore]]>
</item>
</area>
....

As you can see I used the CDATA section to include special characters.
Unfortunately as I parse the file, the "item" element content turns to
be:

»&nbsp;Attivit*&nbsp;Promotore

i.e. the "" character is inserted at the beginning of the string and
the "" character is translated into " ".

I'm using the javax.xml.parsers.DocumentBuilder parser.

Has anyone got any clue? Thanks.

Dario
 
Reply With Quote
 
 
 
 
Thomas Weidenfeller
Guest
Posts: n/a
 
      06-18-2004
Dario Di Bella wrote:

> <![CDATA[&nbsp;Attivit&nbsp;Promotore]]>
> »&nbsp;Attivit &nbsp;Promotore
>
> i.e. the "" character is inserted at the beginning of the string and
> the "" character is translated into " ".


Check your charset encoding. This looks very much as if the encoding in
which the XML comes and the encoding used to read it don't match.

/Thomas
 
Reply With Quote
 
 
 
 
Michael Borgwardt
Guest
Posts: n/a
 
      06-18-2004
Dario Di Bella wrote:
> As you can see I used the CDATA section to include special characters.
> Unfortunately as I parse the file, the "item" element content turns to
> be:
>
> »&nbsp;Attivit &nbsp;Promotore
>
> i.e. the "" character is inserted at the beginning of the string and
> the "" character is translated into " ".


Does your document correctly declare its encoding? If you specify
none, the default is UTF-8 whereas Windows text editors usually
default to CP1252. Trying to parse CP1252-encoded text as UTF-8
could easily lead to the weirdness you describe.
 
Reply With Quote
 
Dario Di Bella
Guest
Posts: n/a
 
      06-18-2004
Bjoern/Michael/Thomas,

I solved this issue declaring a different encoding ("iso-8859-1"
instead of "utf-8"). Thank you very much for your help, and excuse me
for bothering you with a trivial problem

Best regards.

Dario.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Counting utf-8 characters -special characters majna Javascript 4 09-19-2007 01:53 PM
Remove only special characters and junk characters from a file rvino Perl 0 08-14-2007 07:23 AM
Re: Meta-Characters, Special Characters xah@xahlee.org Java 2 05-31-2007 09:25 AM
How to convert HTML special characters to the real characters with a Java script Stefan Mueller HTML 3 07-23-2006 10:09 PM
Incorrect parsing of special characters Dario Di Bella XML 6 06-18-2004 01:57 PM



Advertisments