Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > How to parse XML which contains & in the text ?

Reply
Thread Tools

How to parse XML which contains & in the text ?

 
 
sohan.soni@gmail.com
Guest
Posts: n/a
 
      02-14-2007
Hi,


XML file content is:



<?xml version="1.0"?>

<!DOCTYPE RECORD SYSTEM ".\RECORD.dtd">

<RECORD EXT_SOURCE="DEFAULT" TEMPLATE="GAS_POOL_POINTS">

<TABLE_NAME>GAS_POOL_POINTS</TABLE_NAME>

<COLUMN>

<COLUMN_NAME>GP_POOL</COLUMN_NAME>

<PRIMARY_KEY>Y</PRIMARY_KEY>

<COLUMN_VALUE>Some&Value</COLUMN_VALUE>

</COLUMN>

</Record>



When Parsing (i.e. converting this XML doc to String) this XML file
using Java code, I am getting following exception.

org.xml.sax.SAXParseException: Next character must be ";" terminating
reference to entity "Value".



I think there is some changes/modification needed in DTD to treat the
string in XML which contains & as a literal, instead of expecting some
entity.

Adding to this, XML content is not under our control.

Please reply if somebody knows about this.

 
Reply With Quote
 
 
 
 
Daniel Dyer
Guest
Posts: n/a
 
      02-14-2007
On Wed, 14 Feb 2007 11:31:18 -0000,
<> wrote:

> When Parsing (i.e. converting this XML doc to String) this XML file
> using Java code, I am getting following exception.
>
> org.xml.sax.SAXParseException: Next character must be ";" terminating
> reference to entity "Value".
>


Section 2.4 of the XML 1.0 specification:

"The ampersand character (&) and the left angle bracket (<) MUST NOT
appear in their literal form, except when used as markup delimiters, or
within a comment, a processing instruction, or a CDATA section. If they
are needed elsewhere, they MUST be escaped using either numeric character
references or the strings "&amp;" and "&lt;" respectively. The right angle
bracket (>) may be represented using the string "&gt;", and MUST, for
compatibility, be escaped using either "&gt;" or a character reference
when it appears in the string "]]>" in content, when that string is not
marking the end of a CDATA section."

> I think there is some changes/modification needed in DTD to treat the
> string in XML which contains & as a literal, instead of expecting some
> entity.


You can't fix this in the DTD, the XML is invalid and the parser is
correct to reject it.

> Adding to this, XML content is not under our control.


Unforunately, the only rational fix *is* to change the XML. Either use
&amp; or wrap the element data in a CDATA section. If the XML is
controlled by a third part it would be reasonable to request that they
change it since it is not really XML at all if it is not valid.

Dan.

--
Daniel Dyer
http://www.uncommons.org
 
Reply With Quote
 
 
 
 
Alex Hunsley
Guest
Posts: n/a
 
      02-15-2007
wrote:
> Hi,
>
>
> XML file content is:
>
>
>
> <?xml version="1.0"?>
>
> <!DOCTYPE RECORD SYSTEM ".\RECORD.dtd">
>
> <RECORD EXT_SOURCE="DEFAULT" TEMPLATE="GAS_POOL_POINTS">
>
> <TABLE_NAME>GAS_POOL_POINTS</TABLE_NAME>
>
> <COLUMN>
>
> <COLUMN_NAME>GP_POOL</COLUMN_NAME>
>
> <PRIMARY_KEY>Y</PRIMARY_KEY>
>
> <COLUMN_VALUE>Some&Value</COLUMN_VALUE>
>
> </COLUMN>
>
> </Record>
>
>
>
> When Parsing (i.e. converting this XML doc to String) this XML file
> using Java code, I am getting following exception.
>
> org.xml.sax.SAXParseException: Next character must be ";" terminating
> reference to entity "Value".
>
>
>
> I think there is some changes/modification needed in DTD to treat the
> string in XML which contains & as a literal, instead of expecting some
> entity.
>
> Adding to this, XML content is not under our control.


Like the other replier said, it's invalid XML. It shouldn't contain a
'naked' ampersand like that.
Do you have any chance at all to speak to the producer of this XML? It's
very reasonable to ask them to fix it. If you can't ask them to fix it,
then how about:

1) put in a fix yourself - e.g. do a search and replace kludge on the
content before the XML parser gets it - so replace naked '&' with
'&amp;' (and any other nasty characters that crop up)
2) At least tell the party making the XML that it is broken - you may
help someone else down the line by doing this, if not yourself

lex

 
Reply With Quote
 
sohan.soni@gmail.com
Guest
Posts: n/a
 
      02-18-2007
On Feb 14, 4:39 pm, "Daniel Dyer" <"You don't need it"> wrote:
> On Wed, 14 Feb 2007 11:31:18 -0000,sohan.s...@gmail.com
>
> <sohan.s...@gmail.com> wrote:
> > When Parsing (i.e. converting this XML doc to String) this XML file
> > using Java code, I am getting following exception.

>
> > org.xml.sax.SAXParseException: Next character must be ";" terminating
> > reference to entity "Value".

>
> Section 2.4 of the XML 1.0 specification:
>
> "The ampersand character (&) and the left angle bracket (<) MUST NOT
> appear in their literal form, except when used as markup delimiters, or
> within a comment, a processing instruction, or a CDATA section. If they
> are needed elsewhere, they MUST be escaped using either numeric character
> references or the strings "&amp;" and "&lt;" respectively. The right angle
> bracket (>) may be represented using the string "&gt;", and MUST, for
> compatibility, be escaped using either "&gt;" or a character reference
> when it appears in the string "]]>" in content, when that string is not
> marking the end of a CDATA section."
>
> > I think there is some changes/modification needed in DTD to treat the
> > string in XML which contains & as a literal, instead of expecting some
> > entity.

>
> You can't fix this in the DTD, the XML is invalid and the parser is
> correct to reject it.
>
> > Adding to this, XML content is not under our control.

>
> Unforunately, the only rational fix *is* to change the XML. Either use
> &amp; or wrap the element data in a CDATA section. If the XML is
> controlled by a third part it would be reasonable to request that they
> change it since it is not really XML at all if it is not valid.
>
> Dan.
>
> --
> Daniel Dyerhttp://www.uncommons.org


Thanks Daniel,
That info really helped.

Regards
Sohan

 
Reply With Quote
 
sohan.soni@gmail.com
Guest
Posts: n/a
 
      02-18-2007
On Feb 16, 4:45 am, Alex Hunsley <red...@bluebottle.com> wrote:
> sohan.s...@gmail.com wrote:
> > Hi,

>
> > XML file content is:

>
> > <?xml version="1.0"?>

>
> > <!DOCTYPE RECORD SYSTEM ".\RECORD.dtd">

>
> > <RECORD EXT_SOURCE="DEFAULT" TEMPLATE="GAS_POOL_POINTS">

>
> > <TABLE_NAME>GAS_POOL_POINTS</TABLE_NAME>

>
> > <COLUMN>

>
> > <COLUMN_NAME>GP_POOL</COLUMN_NAME>

>
> > <PRIMARY_KEY>Y</PRIMARY_KEY>

>
> > <COLUMN_VALUE>Some&Value</COLUMN_VALUE>

>
> > </COLUMN>

>
> > </Record>

>
> > When Parsing (i.e. converting this XML doc to String) this XML file
> > using Java code, I am getting following exception.

>
> > org.xml.sax.SAXParseException: Next character must be ";" terminating
> > reference to entity "Value".

>
> > I think there is some changes/modification needed in DTD to treat the
> > string in XML which contains & as a literal, instead of expecting some
> > entity.

>
> > Adding to this, XML content is not under our control.

>
> Like the other replier said, it's invalid XML. It shouldn't contain a
> 'naked' ampersand like that.
> Do you have any chance at all to speak to the producer of this XML? It's
> very reasonable to ask them to fix it. If you can't ask them to fix it,
> then how about:
>
> 1) put in a fix yourself - e.g. do a search and replace kludge on the
> content before the XML parser gets it - so replace naked '&' with
> '&amp;' (and any other nasty characters that crop up)
> 2) At least tell the party making the XML that it is broken - you may
> help someone else down the line by doing this, if not yourself
>
> lex- Hide quoted text -
>
> - Show quoted text -


Thanks Lex,

Sohan

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
how to parse a xml document which has a combination of XML XQuery andXPath beginner Java 13 08-30-2008 01:50 PM
Using XPath to retrieve an XML element which contains a given text anne001 Ruby 4 08-11-2008 04:43 PM
DocumentBuilder object is not able to parse a XML String which has a nodename which contains forward slash! Ed Java 6 08-02-2007 03:29 PM
Regex problem, match if line contains <a>, unless it also contains <b> James Dyer Perl 5 02-20-2004 12:29 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57