Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > SAX succeeds, but StAX fails

Reply
Thread Tools

SAX succeeds, but StAX fails

 
 
Kai Schlamp
Guest
Posts: n/a
 
      03-06-2008
Hy!

I tried to parse PubMed (a biomedical article database) with SAX and
also StAX. The last one failed, but I am not sure why (see Exception
below).
Why does SAX succeed and StAX don't?
The XML document seems to be fine (see
http://www.ncbi.nlm.nih.gov/entrez/e...33&retmode=xml)
Any suggestions?

Kai

StAX example:
String address = "http://www.ncbi.nlm.nih.gov/entrez/
eutils/efetch.fcgi?db=pubmed&id=11748933&retmode=xml";
URL url = new URL(address);

XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser =
factory.createXMLStreamReader(url.openConnection() .getInputStream());

while(parser.hasNext()) {
switch(parser.getEventType()) {
}
parser.next();
}

Error message:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[50,39]
Message: A '(' character or an element type is required in the
declaration of element type "PubMedPubDate".

SAX example:
SAXParserFactory parserFactory =
SAXParserFactory.newInstance();
parserFactory.setValidating(true);
parserFactory.setNamespaceAware(true);
SAXParser parser = parserFactory.newSAXParser();
parser.parse(url.openConnection().getInputStream() , new
PubmedEFetchHandler());

(PubmedEFetchHander is a simple DefaultHandler with some debugging
output).
 
Reply With Quote
 
 
 
 
GArlington
Guest
Posts: n/a
 
      03-06-2008
On Mar 6, 12:57 pm, Kai Schlamp <(E-Mail Removed)> wrote:
> Hy!
>
> I tried to parse PubMed (a biomedical article database) with SAX and
> also StAX. The last one failed, but I am not sure why (see Exception
> below).
> Why does SAX succeed and StAX don't?
> The XML document seems to be fine (seehttp://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11...)


As far as I can see this request DOES NOT generate valid xml (or any
xml).

> Any suggestions?
>
> Kai
>
> StAX example:
> String address = "http://www.ncbi.nlm.nih.gov/entrez/
> eutils/efetch.fcgi?db=pubmed&id=11748933&retmode=xml";
> URL url = new URL(address);
>
> XMLInputFactory factory = XMLInputFactory.newInstance();
> XMLStreamReader parser =
> factory.createXMLStreamReader(url.openConnection() .getInputStream());
>
> while(parser.hasNext()) {
> switch(parser.getEventType()) {
> }
> parser.next();
> }
>
> Error message:
> javax.xml.stream.XMLStreamException: ParseError at [row,col]:[50,39]
> Message: A '(' character or an element type is required in the
> declaration of element type "PubMedPubDate".
>
> SAX example:
> SAXParserFactory parserFactory =
> SAXParserFactory.newInstance();
> parserFactory.setValidating(true);
> parserFactory.setNamespaceAware(true);
> SAXParser parser = parserFactory.newSAXParser();
> parser.parse(url.openConnection().getInputStream() , new
> PubmedEFetchHandler());
>
> (PubmedEFetchHander is a simple DefaultHandler with some debugging
> output).


 
Reply With Quote
 
 
 
 
Kai Schlamp
Guest
Posts: n/a
 
      03-06-2008
Seems to be a posting converting error (I am posting through google
groups).
The link in your message doesn't contain the retmode=xml anymore.
Please try this url:
http://www.ncbi.nlm.nih.gov/entrez/e...33&retmode=xml
It should generate valid XML.
 
Reply With Quote
 
Kai Schlamp
Guest
Posts: n/a
 
      03-06-2008
Ok, I checked the new link again and the problem remains. When I click
the link and it opens in Firefox, it is indeed no XML.
But when you then press the "Go To" button (green button on the right
of the url input field), then the valid XML appears. I am not sure why
this happens, but it doesn't have to do something with my original
problem. Seems to be a little Firefox problem.


On 6 Mrz., 17:49, Kai Schlamp <(E-Mail Removed)> wrote:
> Seems to be a posting converting error (I am posting through google
> groups).
> The link in your message doesn't contain the retmode=xml anymore.
> Please try this url:http://www.ncbi.nlm.nih.gov/entrez/e...d&id=11748933&...
> It should generate valid XML.


 
Reply With Quote
 
GArlington
Guest
Posts: n/a
 
      03-07-2008
On Mar 6, 5:01 pm, Kai Schlamp <(E-Mail Removed)> wrote:
> Ok, I checked the new link again and the problem remains. When I click
> the link and it opens in Firefox, it is indeed no XML.
> But when you then press the "Go To" button (green button on the right
> of the url input field), then the valid XML appears. I am not sure why
> this happens, but it doesn't have to do something with my original
> problem. Seems to be a little Firefox problem.
>
> On 6 Mrz., 17:49, Kai Schlamp <(E-Mail Removed)> wrote:
>
> > Seems to be a posting converting error (I am posting through google
> > groups).
> > The link in your message doesn't contain the retmode=xml anymore.
> > Please try this url:http://www.ncbi.nlm.nih.gov/entrez/e...d&id=11748933&...
> > It should generate valid XML.


OK, I tried accessing it with IE and it worked first time, I thought
that I gave it a try in IE yesterday too, but...
I fetched your url and parsed it (with my own methods) and it works,
so I suspect that there is a problem with StAX...
The only thing I can suggest is: try to dump what you get from your
url BEFORE you try to parse it and then dump the data at each step
until you get to your error - this will help you to find where the
problem first shows it's ugly head...
 
Reply With Quote
 
Kai Schlamp
Guest
Posts: n/a
 
      03-12-2008
I still have the same problem with StAX. I dumped the output of the
url before parsing it, and it seems to be fine and well formed.
But parsing with StAX still gives me an exception right in the first
loop (SAX seems to work fine).
Below is a small test class. Can someone explain to me, why this
happens?
I also tried to copy the output of the url in a file and parsing it
directly from disk ... didn't solve that problem.
Perhaps I should try it with another StAX provider. I found one on the
net named Woodstox. Are there any more? What is the default
implementation? An Apache project?

The error output of the below test class:

START_DOCUMENT: 1.0
beforeNext
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[50,39]
Message: A '(' character or an element type is required in the
declaration of element type "PubMedPubDate".
at
com.sun.org.apache.xerces.internal.impl.XMLStreamR eaderImpl.next(XMLStreamReaderImpl.java:
58
at StaxTester.main(StaxTester.java:49)

The test class:

import java.net.URL;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;

public class StaxTester {

public static void main(String[] args) {
try {
String address = "http://www.ncbi.nlm.nih.gov/entrez/eutils/
efetch.fcgi?db=pubmed&retmode=xml&id=11748933";
//String address = "http://www.ncbi.nlm.nih.gov/entrez/eutils/
esearch.fcgi?db=pmc&term=stem+cells+AND+free+fullt ext[filter]";
URL url = new URL(address);

XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader parser =
factory.createXMLStreamReader(url.openConnection() .getInputStream());

while(parser.hasNext()) {
switch(parser.getEventType()) {
case XMLStreamConstants.START_DOCUMENT:
System.out.println( "START_DOCUMENT: " +
parser.getVersion() );
break;

case XMLStreamConstants.END_DOCUMENT:
System.out.println( "END_DOCUMENT: " );
parser.close();
break;

case XMLStreamConstants.NAMESPACE:
System.out.println( "NAMESPACE: " +
parser.getNamespaceURI() );
break;

case XMLStreamConstants.START_ELEMENT:
System.out.println( "START_ELEMENT: " +
parser.getLocalName() );
break;

case XMLStreamConstants.CHARACTERS:
if ( ! parser.isWhiteSpace() )
System.out.println( "CHARACTERS: " + parser.getText() );
break;

case XMLStreamConstants.END_ELEMENT:
System.out.println("END_ELEMENT: " +
parser.getLocalName() );
break;

default:
break;
}
System.out.println("beforeNext");
parser.next();
System.out.println("afterNext");
}

/** SAX succeeds. Why that? */
// SAXParserFactory parserFactory = SAXParserFactory.newInstance();
// parserFactory.setValidating(true);
// parserFactory.setNamespaceAware(true);
// SAXParser parser = parserFactory.newSAXParser();
// parser.parse(url.openConnection().getInputStream() , new
PubmedEFetchHandler());
//
}
catch (Exception e) {
e.printStackTrace();
}

}

}

 
Reply With Quote
 
Owen Jacobson
Guest
Posts: n/a
 
      03-12-2008
On Mar 6, 8:57*am, Kai Schlamp <(E-Mail Removed)> wrote:
> Hy!
>
> I tried to parse PubMed (a biomedical article database) with SAX and
> also StAX. The last one failed, but I am not sure why (see Exception
> below).
> Why does SAX succeed and StAX don't?
> The XML document seems to be fine (seehttp://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11...)
> Any suggestions?
>


...

> * * * * * * String address = "http://www.ncbi.nlm.nih.gov/entrez/
> eutils/efetch.fcgi?db=pubmed&id=11748933&retmode=xml";
> * * * * * * URL url = new URL(address);


...

> Error message:
> javax.xml.stream.XMLStreamException: ParseError at [row,col]:[50,39]
> Message: A '(' character or an element type is required in the
> declaration of element type "PubMedPubDate".


The XML document itself is fine, but non-validating due to problems in
the DTD; StAX by default attempts to validate input documents. SAX is
ignoring the DTD associated with the XML document, and therefore
doesn't notice that the DTD is invalid.

-o
 
Reply With Quote
 
Kai Schlamp
Guest
Posts: n/a
 
      03-12-2008
On Mar 12, 10:27 pm, Owen Jacobson <(E-Mail Removed)> wrote:
> On Mar 6, 8:57 am, Kai Schlamp <(E-Mail Removed)> wrote:
>
> > Hy!

>
> > I tried to parse PubMed (a biomedical article database) with SAX and
> > also StAX. The last one failed, but I am not sure why (see Exception
> > below).
> > Why does SAX succeed and StAX don't?
> > The XML document seems to be fine (seehttp://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=11...)
> > Any suggestions?

>
> ...
>
> > String address = "http://www.ncbi.nlm.nih.gov/entrez/
> > eutils/efetch.fcgi?db=pubmed&id=11748933&retmode=xml";
> > URL url = new URL(address);

>
> ...
>
> > Error message:
> > javax.xml.stream.XMLStreamException: ParseError at [row,col]:[50,39]
> > Message: A '(' character or an element type is required in the
> > declaration of element type "PubMedPubDate".

>
> The XML document itself is fine, but non-validating due to problems in
> the DTD; StAX by default attempts to validate input documents. SAX is
> ignoring the DTD associated with the XML document, and therefore
> doesn't notice that the DTD is invalid.
>
> -o


Thanks for the answer.
So disabling DTD validation should solve that problem?
I tried
factory.setProperty("javax.xml.stream.isValidating ", false);
(which is the default as stated in the Javadoc), but it also didn't
solve the problem.

Another thing ... I just tried the Woodstox implementation (just added
it to the classpath), and everything works fine (even without changing
any property). So it seems, that there is a specific problem with the
reference implementation.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Plain old SAX and minidom useable? But how then? Dobedani Python 3 08-01-2007 01:04 PM
STAF(Software Testing Automation Framework) with STAX for running python test suites/cases davidodowd@gmail.com Python 0 08-15-2006 03:17 PM
Problems with SAX parser in Java (SAX2 driver class javax.xml.parsers.SAXParser found but cannot be loaded) Per Magnus L?vold Java 0 11-16-2004 04:02 PM
New Releases: Bon Jovi, Umbrellas Of Cherbourg,Stax Museum: Updated complete downloadable R1 DVD DB & info lists Doug MacLean DVD Video 0 01-10-2004 05:09 AM



Advertisments