Hi,
I have actually done exactly what you suggested using the Expat SAX
parser
(and this feature is now included
in the XML gawk extension found at
http://sourceforge.net/projects/xmlgawk).
The key is to call XML_Parse until it returns XML_STATUS_ERROR.
At that point, one calls XML_GetCurrentByteIndex to find the location
of the error. You can then close out the parsing of the previous
document,
and then start parsing the new one that begins at the returned error
offset into the file. To see how this is done, you can look in
xml_puller.c in the sourceforge repository:
http://cvs.sourceforge.net/viewcvs.p....6&view=markup
Or you can just use xmlgawk and not worry about implementing this
yourself.
Regards,
Andy