Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Encoding problem with SAX parser

Reply
Thread Tools

Encoding problem with SAX parser

 
 
Martin Schlatter
Guest
Posts: n/a
 
      12-10-2003
I'm parsing an XML document with a SAX parser.
I initialise it in the following way:

javax.xml.parsers.DocumentBuilderFactory docBuilderFactory =
javax.xml.parsers.DocumentBuilderFactory.newInstan ce();
docBuilder = docBuilderFactory.newDocumentBuilder();
doc = docBuilder.parse(new File(fname));

But while parsing, I get an exception because their are characters
which are not valid utf-8 chars. I cannot change the input file. Is
there any way to skip over the invalid characters? Is there a chance
to use docBuilder.parse(InputStream) and then skip the invalid
characters?

Jens Martin Schlatter

--
"Als Mensch bist Du zu dumm und als Schwein hast Du zu kurze Ohren."
Norbert Gleissner von der Triple-D-Ranch in de.rec.tiere.pferde
 
Reply With Quote
 
 
 
 
Mike Schilling
Guest
Posts: n/a
 
      12-12-2003

"Martin Schlatter" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> I'm parsing an XML document with a SAX parser.
> I initialise it in the following way:
>
> javax.xml.parsers.DocumentBuilderFactory docBuilderFactory =
> javax.xml.parsers.DocumentBuilderFactory.newInstan ce();
> docBuilder = docBuilderFactory.newDocumentBuilder();
> doc = docBuilder.parse(new File(fname));
>
> But while parsing, I get an exception because their are characters
> which are not valid utf-8 chars. I cannot change the input file.


Is the file in UTF-8? If not, is it in any valid encoding? If so, try
replacing your last line with

org.xml.sax.InputSource src = new InputSource(new
FileInputStream(fname);
src.setEncoding(YourEncodingNameGoesHere);
doc = docBuilder.parse(src);

If not, you'll have to create a FilterInputStream that removes the bad
characters and replace your last line with:

doc = docBuilder.parse(new YourFilterStream(new FileInputStream(fname));



 
Reply With Quote
 
 
 
 
Martin Schlatter
Guest
Posts: n/a
 
      12-14-2003
> Is the file in UTF-8?

Yes, its UTF-8, but some characters are invalid.

> If not, you'll have to create a FilterInputStream that removes the bad
> characters and replace your last line with:
>
> doc = docBuilder.parse(new YourFilterStream(new FileInputStream(fname));


Ok, I see. Thanks! I'll try that!

Jens Martin Schlatter


--
"Als Mensch bist Du zu dumm und als Schwein hast Du zu kurze Ohren."
Norbert Gleissner von der Triple-D-Ranch in de.rec.tiere.pferde
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Nokogiri SAX parser encoding problem Michel Demazure Ruby 6 08-25-2010 06:29 AM
How to force SAX parser to ignore encoding problems Ɓukasz Python 2 08-07-2009 06:40 AM
c++ Xalan1.4/Xerces2.1 SAX parser. How to find the encoding type? RamaKrishna Narla XML 1 08-22-2006 12:08 PM
Sax Parser problem : xml encoding of string?? brightoceanlight@hotmail.com Java 5 09-15-2005 03:58 PM
XML-Parser to XML-Parser communication (encoding issues?) arne Perl Misc 0 09-13-2005 12:53 PM



Advertisments