Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Tidy; how to make it XML-conform? <BR> needs to be closed

Reply
Thread Tools

Tidy; how to make it XML-conform? <BR> needs to be closed

 
 
Ragnar
Guest
Posts: n/a
 
      10-23-2006
Hi

I have one question regarding Tidy (http://tidy.sourceforge.net). My
source XML-file has got a lot of unclosed <BR>-tags. Which command do I
need (in my tidy config-file) to close it <BR/> and make valid XML out
of it?


regards
Rag.

 
Reply With Quote
 
 
 
 
Richard Tobin
Guest
Posts: n/a
 
      10-23-2006
In article <(E-Mail Removed) om>,
Ragnar <(E-Mail Removed)> wrote:

>I have one question regarding Tidy (http://tidy.sourceforge.net). My
>source XML-file has got a lot of unclosed <BR>-tags. Which command do I
>need (in my tidy config-file) to close it <BR/> and make valid XML out
>of it?


Use the -asxml or -asxhtml flag.

-- Richard
 
Reply With Quote
 
 
 
 
Bjoern Hoehrmann
Guest
Posts: n/a
 
      10-24-2006
* Ragnar wrote in comp.text.xml:
>I have one question regarding Tidy (http://tidy.sourceforge.net). My
>source XML-file has got a lot of unclosed <BR>-tags. Which command do I
>need (in my tidy config-file) to close it <BR/> and make valid XML out
>of it?


HTML Tidy is not designed to clean up arbitrary XML documents, so if by
"XML-file" you really mean some arbitrary XML document, then it might be
difficult to address your problem. If you mean "HTML" or "XHTML" instead
then use the output-* family of options, or the -asxml command line
option and ensure that you have not set the input-xml flag.
--
Björn Höhrmann · (E-Mail Removed) · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
 
Reply With Quote
 
Ragnar
Guest
Posts: n/a
 
      10-24-2006
Thank your for your help. It is very important to get support because
I have to finish it today

my command line looks like: tidy -asxml -config config.txt old.xml

I get the same error like without using "-asxml"

Error: unexpected </reference> in <BR>

That means it finds an unclosed <BR>-tag at node "reference".

To get rid of it I could use "no-xml" as input-format but then tidy
would transform my XML into a HTML-structure what is not wanted


Ragnar

 
Reply With Quote
 
Ragnar
Guest
Posts: n/a
 
      10-24-2006
Another question regarding Tidy:

I want to use the COM-Wrapper of Tidy. Now I have found this example:
I dont know why "Stat As Long" is used. I tried to work without "Stat"
but I cannot call objTidyDoc.MethodName directly


Dim objTidyDoc As TidyDocument
Set objTidyDoc = New TidyDocument
Stat = 0
Stat = objTidyDoc.LoadConfig(strTidyConfig)
Stat = objTidyDoc.ParseFile(strFilePath & strXmlFileName)
Stat = objTidyDoc.CleanAndRepair()
Stat = objTidyDoc.RunDiagnostics()
Stat = objTidyDoc.SaveFile(strFilePath & strXmlFileName)

 
Reply With Quote
 
Ragnar
Guest
Posts: n/a
 
      10-26-2006
Now I know how to use the COM-Wrapper but my main question is still
open

How can I transform this source-xml into valid xml without using the
workaround of getting an HTML-output? I dont want to have the HTML-tags
like <HEAD> and <BODY> around it

http://www.ticope.de/tmp/source.xml/download

help VERY appreciated, this task keeps me busy too long
Rag.

 
Reply With Quote
 
Joseph Kesselman
Guest
Posts: n/a
 
      10-26-2006
If your input isn't HTML, Tidy may not be able to help you, and nothing
else out there is likely to be able to read your mind and guess that you
intended <BR> tags to autoterminate.

Since you know that *was* your intent, how about just doing a text-level
global replace of <BR> with <BR/>?
 
Reply With Quote
 
Ragnar
Guest
Posts: n/a
 
      10-26-2006

Joseph Kesselman schrieb:
> Since you know that *was* your intent, how about just doing a text-level
> global replace of <BR> with <BR/>?


Joseph,
that is a very nice idea

It could look like this (assuming <BR> appears in node "reference"):
Set objDOMnode = objDom.selectSingleNode("//reference")
If Not objDOMnode Is Nothing Then
strReference = objDOMnode.Text
End If
strReference = Replace(strReference , "<BR>", "<BR/>", 1, -1,
vbTextCompare)

But I dont get a value in strReference which means that XML has to be
valid before working with XMLDOM. Am I right? I checked it by closing
<BR/> manually, then I get a value for strReference

 
Reply With Quote
 
Joe Kesselman
Guest
Posts: n/a
 
      10-26-2006
Ragnar wrote:
> But I dont get a value in strReference which means that XML has to be
> valid before working with XMLDOM.


XML has to be well-formed before using any XML tools. An unterminated
element, such as your <BR>, is not well-formed XML. Fix it first.

--
() ASCII Ribbon Campaign | Joe Kesselman
/\ Stamp out HTML e-mail! | System architexture and kinetic poetry
 
Reply With Quote
 
Andy Dingley
Guest
Posts: n/a
 
      10-27-2006

Ragnar wrote:

> How can I transform this source-xml into valid xml without using the
> workaround of getting an HTML-output?


Find some non-Tidy Tidy-like XML tool ? Maybe write one for your
specific task?

Tidy uses an approximation of an SGML parser and a tag-soup strainer to
take "approximate HTML", turn it into the best-guess internal
(DOM-like) model of the intended page, then serialise it accurately.
This relies on three things that you don't have available:

* SGML parsing (omitted tags can often be inferred cleanly)
* A known HTML DTD
* Fix-up code outside the SGML parser that has assumed HTML-soup
behaviours coded explicitly into it.

If your problem is "bad XML" that isn't even approximating HTML, then I
sympathise, but Tidy has three of its hands tied.

Why is your bad XML bad? What's the problem? Can you build some specifc
tool that fixes some specific problem? Even if it has to work with
simple text-file processing and can't support more than one encoding,
it might be enough.

I've done a lot of work with RSS which is only approximate XML at best
and often significantly invalid. Typically it includes HTML entity
references (eg &eacute; )that aren't part of XML. It's not too hard to
scan the whole document with a crude entity reference expander that can
map these (from a known list) onto the numeric form. I usually try to
XML parse them, then if this fails I check for the presence of such
entities, convert them and then attempt to re-parse.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
How to make sure the database connection is closed? Wei Lu ASP .Net 5 02-18-2009 01:47 AM
How to make sure child window gets closed SAL ASP .Net 5 10-01-2008 03:42 AM
(RPC)Remote Procedure Call has closed and windows needs to shut down!!!!!!! fokker Computer Support 7 09-11-2003 07:50 AM
Opera - .closed not accessible if window is closed? Matt Kruse Javascript 5 09-09-2003 01:27 AM



Advertisments