Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Problem w/ DocumentBuilder parse method

Reply
Thread Tools

Problem w/ DocumentBuilder parse method

 
 
John L.
Guest
Posts: n/a
 
      12-30-2012
I'm pre-processing a file in an attempt to use the subject method, and receive the following error:

[Fatal Error] EXTRACT.TMP:51:23: The entity "nbsp" was referenced, but not declared.
Exception in thread "main" org.xml.sax.SAXParseException: The entity "nbsp" was referenced, but not declared.
at com.sun.org.apache.xerces.internal.parsers.DOMPars er.parse(Unknown Source)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBu ilderImpl.parse(Unknown Source)
at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
at Extract.CmdLine(Extract.java:144)
at Extract.main(Extract.java:79)

The pertinent portion of the file being parsed follows:

[45]<div>
[46]<input type="hidden" name="cx" value="partner-pub-5436175752152469:m8vqbgi2n
21" />
[47]<input type="hidden" name="cof" value="FORID:10" />
[48]<input type="hidden" name="ie" value="ISO-8859-1" />
[49]<input type="text" name="q" size="55" />
[50]<input type="submit" name="sa" value="PCM Search" />
[51] &nbsp; &nbsp; &nbsp; &nbsp; </div >

What is the required declaration syntax for &nbsp; to allow the file to be parsed?

Thanks in advance for your time and consideration.
 
Reply With Quote
 
 
 
 
Arne Vajh°j
Guest
Posts: n/a
 
      12-30-2012
On 12/30/2012 2:30 PM, John L. wrote:
> I'm pre-processing a file in an attempt to use the subject method, and receive the following error:
>
> [Fatal Error] EXTRACT.TMP:51:23: The entity "nbsp" was referenced, but not declared.
> Exception in thread "main" org.xml.sax.SAXParseException: The entity "nbsp" was referenced, but not declared.
> at com.sun.org.apache.xerces.internal.parsers.DOMPars er.parse(Unknown Source)
> at com.sun.org.apache.xerces.internal.jaxp.DocumentBu ilderImpl.parse(Unknown Source)
> at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
> at Extract.CmdLine(Extract.java:144)
> at Extract.main(Extract.java:79)
>
> The pertinent portion of the file being parsed follows:
>
> [45]<div>
> [46]<input type="hidden" name="cx" value="partner-pub-5436175752152469:m8vqbgi2n
> 21" />
> [47]<input type="hidden" name="cof" value="FORID:10" />
> [48]<input type="hidden" name="ie" value="ISO-8859-1" />
> [49]<input type="text" name="q" size="55" />
> [50]<input type="submit" name="sa" value="PCM Search" />
> [51] &nbsp; &nbsp; &nbsp; &nbsp; </div >
>
> What is the required declaration syntax for &nbsp; to allow the file to be parsed?


Entities should be defined in the DTD.

The above looks like XHTML, so maybe it will work if you add a proper
DOCTYPE at the top (I think XHTML DTD defines nbsp)..

Arne


 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      12-31-2012
On Sun, 30 Dec 2012 11:30:24 -0800 (PST), "John L."
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone
who said :

>The entity "nbsp" was referenced, but not declared.


XML supports just a tiny handful of entities and &nbsp; is not one of
them. You are expected to use UTF-8 encodings or formally declare the
meaning of your entities in a DTD.

see http://mindprod.com/jgloss/xml.html#AWKWARD
--
Roedy Green Canadian Mind Products http://mindprod.com
Students who hire or con others to do their homework are as foolish
as couch potatoes who hire others to go to the gym for them.
 
Reply With Quote
 
Stanimir Stamenkov
Guest
Posts: n/a
 
      01-01-2013
Sun, 30 Dec 2012 11:30:24 -0800 (PST), /John L./:

> I'm pre-processing a file in an attempt to use the subject method, and receive the following error:
>
> [Fatal Error] EXTRACT.TMP:51:23: The entity "nbsp" was referenced, but not declared.
> [...]
> What is the required declaration syntax for &nbsp; to allow the file to be parsed?


As Arne Vajh°j points in another reply, there should be an XHTML
DOCTYPE declaration at the beginning of the document. Browsers
usually don't have problem processing XHTML containing entity
references from the XHTML DTD, even without DOCTYPE declaration,
because either:

1. The document is served as text/html, which is not processed as
XML at all, or;

2. Browsers have and refer to the XHTML DTD locally and are
automatically associating it automatically based on content-type:
application/xhtml+xml, or xmlns="http://www.w3.org/1999/xhtml" on
the root html element.

If the document you're trying to parse is at your control, you could:

1. Add the XHTML DOCTYPE declaration manually:

<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

or even:

<!DOCTYPE html
SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

You may still want to supply EntityResolver [1] to serve this
DTD from a local resource;

2. Add a DOCTYPE with a local subset containing just the necessary
entity declarations, like:

<!DOCTYPE html [
<!ENTITY nbsp " ">
]>

If you're parsing documents which don't have DOCTYPE declaration and
are not in your control, you may supply EntityResolver2
implementation which defines additional interface for just that purpose:

http://docs.oracle.com/javase/6/docs...lang.String%29

[1]
http://docs.oracle.com/javase/6/docs...ityResolver%29

--
Stanimir
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
DocumentBuilder's parse method is not able to read a tag such as this one <weight/> BeGreen Java 14 08-13-2006 11:28 PM
blocking I/O with javax.xml.parsers.DocumentBuilder.parse() and javax.xml.transform.Transformer.transform() jazzdman@gmail.com Java 1 03-27-2005 06:56 AM
DocumentBuilder.parse(InputSource is) David Java 1 06-06-2004 09:52 AM
DocumentBuilder.parse don't get data from InputSource David Java 0 05-27-2004 03:53 PM
DocumentBuilder.parse() returns "[#Document: null]" Document Mike Java 0 09-06-2003 03:38 AM



Advertisments