Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > HTML to XML Conversion - Difficulty with Tidy and TagSoup

Thread Tools

HTML to XML Conversion - Difficulty with Tidy and TagSoup

Posts: n/a
I'm trying to convert html pages to xml and I'm having some difficulty
with the folowing:

1. I try to use Tidy but the html that I'm trying to convert to xhtml
has too many errors and so I spend a lot of time trying to "fix" the
html before running it through Tidy. I'm using Tidy with -asxml

2. I've tried using TagSoup with JDOM but the SAXBuilder internally
tries to set the namespace prefixes and TagSoup does not support that
internal feature.

I really would appreciate help from someone who has delt with having
to crank out lots of html from poorly formatted html. I appreciate
any help!

Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
NotePad++ and HTML Tidy question Dave Boland HTML 5 08-18-2009 02:04 PM
How to use TagSoup programatically Chanchal Java 1 08-06-2009 11:05 PM
HTML Tidy in ASP.NET Christoph Schneegans ASP .Net 2 04-27-2009 11:00 PM
ElementTree Tidy HTML Tree Builder and comments =?utf-8?q?Bj=C3=B6rn_Lindstr=C3=B6m?= Python 0 03-19-2005 03:41 AM
sending command line arguements to HTML Tidy d davis Perl 0 04-27-2004 02:23 PM