Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > parsing non-well-formed XML (SAX)

Reply
Thread Tools

parsing non-well-formed XML (SAX)

 
 
Timo Nentwig
Guest
Posts: n/a
 
      06-04-2004
Hi!

I need to parse multi-MByte "XML" files which are not well-formed, i.e.
there's are plenty of <TAGS> in there instead of <TAGS />. I'm also not
sure about case sensitiveness.

Any ready-to-use solutions?

Timo
 
Reply With Quote
 
 
 
 
Andy Fish
Guest
Posts: n/a
 
      06-04-2004
well I shouldn't think there are any XML parsers you can use.

the trouble with not well formed documents is that only you will know what
types of non-well-formedness are acceptable and how to interpret them - Any
piece of information that is not a well-formed XML document is a badly
formed XML document!!

So, the key to a successful solution is to write down what your definition
of a valid input document is. only once you have done this can you evaluate
different approaches.

if there are only a few well-known examples of badly formed tags you could
pre-process it first to generate XML. e.g. say you knew that the TAGS
element could never have any content but it might be missing the end-tag
delimiter (like the <br> in HTML) it would be easy to pick it up.

Failing that, antlr is a well known parser generator which would be a
builing block on the way to making your own parser.

"Timo Nentwig" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi!
>
> I need to parse multi-MByte "XML" files which are not well-formed, i.e.
> there's are plenty of <TAGS> in there instead of <TAGS />. I'm also not
> sure about case sensitiveness.
>
> Any ready-to-use solutions?
>
> Timo



 
Reply With Quote
 
 
 
 
Timo Nentwig
Guest
Posts: n/a
 
      06-04-2004
Andy Fish wrote:
> well I shouldn't think there are any XML parsers you can use.


Something like NekoHTML's HTMLTagBalancer...
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What libraries should I use for MIME parsing, XML parsing, and MySQL ? John Levine Ruby 0 02-02-2012 11:15 PM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM
Sequential XML parsing with xml.sax peter@hardy.dropbear.id.au Python 2 08-24-2005 01:29 AM
Clarification on XML parsing & namespaces (xml.dom.minidom) Greg Wogan-Browne Python 1 01-28-2005 03:19 AM
Print XML parsing to JspWriter (out) Class org.xml.sax.helpers.NewInstance can not access a member of class javax.xml.parsers.SAXParser with modifiers "protected" Per Magnus L?vold Java 0 11-15-2004 02:27 PM



Advertisments