Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Parsing generic XML

Reply
Thread Tools

Parsing generic XML

 
 
Roedy Green
Guest
Posts: n/a
 
      06-11-2008
I have some XML, namely PAD files, for which I have no schema, though
I probably could cook one up in a day or two.

Similarly I have some XHTML, I want to screenscrape where, I really
only care about the <table <tr and <td elements.

So what I am after is some sort of extremely relaxed schema that will
eat pretty well anything so long as the tags balance.

I tried parsing without any schema at all, and it choked on &nbsp;
entities.
--

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
 
Reply With Quote
 
 
 
 
Owen Jacobson
Guest
Posts: n/a
 
      06-11-2008
On Jun 11, 10:40*am, Roedy Green <(E-Mail Removed)>
wrote:
> I have some XML, namely PAD files, for which I have no schema, though
> I probably could cook one up in a day or two. *
>
> Similarly I have some XHTML, I want to screenscrape where, I really
> only care about the <table <tr and <td elements. *
>
> So what I am after is some sort of extremely relaxed schema that will
> eat pretty well anything so long as the tags balance.
>
> I tried parsing without any schema at all, and it choked on &nbsp;
> entities.


Entity references (&nbsp; and friends) only have meaning with respect
to a schema or DTD which maps them to entities (eg.,   in the
case of &nbsp. XML documents which contain entity references MUST
contain a definition somewhere; there's not really any avoiding it.

Fortunately, for XHTML that's easy; there's a published DTD.

In the case of PAD files you may have to replace the entity references
with entities manually, if you can't find a schema that defines them.

Any basic XML parser (jdom, dom4j, sax, w3c dom, et multiple cetera)
should accept any well-formed document if you turn off validation.

-o
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
not just generic type programming,but also parallism generic syntaxprogramming?? minlearn C++ 2 03-13-2009 05:17 PM
generic interfaces with generic methods Murat Tasan Java 1 02-03-2009 12:17 PM
Generic class in a non generic class nramnath@gmail.com Java 2 07-04-2006 07:24 AM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM
Print XML parsing to JspWriter (out) Class org.xml.sax.helpers.NewInstance can not access a member of class javax.xml.parsers.SAXParser with modifiers "protected" Per Magnus L?vold Java 0 11-15-2004 02:27 PM



Advertisments