Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > XML Parsing Problems with SAX xerces

Reply
Thread Tools

XML Parsing Problems with SAX xerces

 
 
John Smith
Guest
Posts: n/a
 
      09-26-2005
I am trying to parse an XML document that starts with the following tag:

<?xml version='1.0' encoding='windows-1252' ?>

This is causing an error::

Caused by: org.xml.sax.SAXParseException: The encoding "windows-1252" is not
supported.
at org.apache.xerces.framework.XMLParser.reportError( XMLParser.java:1056)
at
org.apache.xerces.readers.DefaultEntityHandler.sta rtReadingFromDocument(DefaultEntityHandler.java:54 1)
at org.apache.xerces.framework.XMLParser.parseSomeSet up(XMLParser.java:305)
at org.apache.xerces.framework.XMLParser.parse(XMLPar ser.java:947)

Is there a way i can get it to support windows-1252 or ignore it as I cannot
edit the document itself.

Thanks

Jon


 
Reply With Quote
 
 
 
 
Roedy Green
Guest
Posts: n/a
 
      09-26-2005
On Mon, 26 Sep 2005 08:56:22 +0100, "John Smith"
<(E-Mail Removed)> wrote or quoted :

><?xml version='1.0' encoding='windows-1252' ?>

I thought the XML had UTF-8 as the only supported encoding. That was
one of its key features that made it a suitable interchange format.

Now I see every XML utility listing its set of supported encodings!
(Imagine an exorcist crossing his arms in horror.)

--
Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.
 
Reply With Quote
 
 
 
 
John C. Bollinger
Guest
Posts: n/a
 
      09-27-2005
Roedy Green wrote:

> I thought the XML had UTF-8 as the only supported encoding. That was
> one of its key features that made it a suitable interchange format.


No, but you may have been thinking of this: "In the absence of
information provided by an external transport protocol (e.g. HTTP or
MIME), it is a fatal error for an entity including an encoding
declaration to be presented to the XML processor in an encoding other
than that named in the declaration, or for an entity which begins with
neither a Byte Order Mark nor an encoding declaration to use an encoding
other than UTF-8." [XML 1.1, section 4.3.3; the same appears in XML
1.0, also in section 4.3.3]

You might also have been thinking of the fact the XML is defined in
terms of Unicode characters, which indeed is a key feature that makes it
a suitable interchange format.

> Now I see every XML utility listing its set of supported encodings!
> (Imagine an exorcist crossing his arms in horror.)


Given UTF-8's status as the default encoding, any utility that does not
support that encoding is handicapped to the point of being downright
broken. I know of none such, and never expect to see any. With that
being the case it is safe to encode any XML document you create in
UTF-8; any service or utility that fails to read it on account of the
encoding has been designed specifically to prevent you from feeding it a
document of your own creation. (So why fight it?)

--
John Bollinger
http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      09-27-2005
On Mon, 26 Sep 2005 22:19:23 -0500, "John C. Bollinger"
<(E-Mail Removed)> wrote or quoted :

>Given UTF-8's status as the default encoding, any utility that does not
>support that encoding is handicapped to the point of being downright
>broken. I know of none such, and never expect to see any. With that
>being the case it is safe to encode any XML document you create in
>UTF-8; any service or utility that fails to read it on account of the
>encoding has been designed specifically to prevent you from feeding it a
>document of your own creation. (So why fight it?)


But the problem is if you let people encode in CP278 (Scandinavian
EBCDIC) you force any reader of that file to support obsolete baggage
as well.

There was no advantage in allowing anything but UTF-8 and perhaps
UTF-16 If people want to write such files for internal purposes that
is their business, but they have no business being passed around as
interchange files.

Java has to support all these old encodings to deal with legacy apps,
but XML does not.

The other thing, embedding the encoding in plain text is a bit of a
chicken and egg problem. You have to know the encoding to interpret
the encoding specification. Unicode has the advantage you can tell
what you have got just examining the first few bytes.

Remember Bill the Cat from Bloom County? I think this decision
deserves one of his hair ball spitting up noises.
--
Canadian Mind Products, Roedy Green.
http://mindprod.com Again taking new Java programming contracts.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
xerces/SAX xml search foolproofplan@gmail.com XML 4 05-02-2007 12:26 PM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM
Xerces SAX encoding problems John Smith Java 1 09-21-2005 09:29 PM
[NEWBIE] Xerces/SAX configuration problems I Hate Sheep XML 2 08-03-2005 08:48 AM
Print XML parsing to JspWriter (out) Class org.xml.sax.helpers.NewInstance can not access a member of class javax.xml.parsers.SAXParser with modifiers "protected" Per Magnus L?vold Java 0 11-15-2004 02:27 PM



Advertisments