Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > ASP .Net > Parse a html file as a XML file

Reply
Thread Tools

Parse a html file as a XML file

 
 
Stan SR
Guest
Posts: n/a
 
      01-19-2008
Hi,

I need to read a html file and parse it as a XML File.

All my html file have this structure.
<html>
<head>
<title>
</title>
<script language="javascript">
</script>
</head>
<body>
</body>
</html>

My code has to read some sections (title, script, body).
Everything works when the script language (javascript code) section has not
code or not a lot, but sometimes it fails when there are characters like ;
(especially in "for" statement).
So for that works, I had to add "decorate" the script section with
<![CDATA[ ]]> and it looks like

<script language="javascript">
<![CDATA[

]]>
</script>

Is there a way to parse the file without using the <![CDATA[ ]]> tag ?

Stan


 
Reply With Quote
 
 
 
 
Cowboy \(Gregory A. Beamer\)
Guest
Posts: n/a
 
      01-19-2008
Try <!-- and -->, which is a standard practice. I imagine some parsers will
still puke on this methodology, but it should solve the major issue.

Can you solve this without doing anything? Probably not. It is the nature of
freeform sections, which XML does not understand the same way HTML parsers
do, as the rules are more strict.

--
Gregory A. Beamer
MVP, MCP: +I, SE, SD, DBA

*************************************************
| Think outside the box!
|
*************************************************
"Stan SR" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi,
>
> I need to read a html file and parse it as a XML File.
>
> All my html file have this structure.
> <html>
> <head>
> <title>
> </title>
> <script language="javascript">
> </script>
> </head>
> <body>
> </body>
> </html>
>
> My code has to read some sections (title, script, body).
> Everything works when the script language (javascript code) section has
> not code or not a lot, but sometimes it fails when there are characters
> like ; (especially in "for" statement).
> So for that works, I had to add "decorate" the script section with
> <![CDATA[ ]]> and it looks like
>
> <script language="javascript">
> <![CDATA[
>
> ]]>
> </script>
>
> Is there a way to parse the file without using the <![CDATA[ ]]> tag ?
>
> Stan
>
>



 
Reply With Quote
 
 
 
 
Peter Bromberg [C# MVP]
Guest
Posts: n/a
 
      01-19-2008
You could try using Simon Mourier's "HtmlAgilityPack", which can be found on
codeplex.com.
It uses the concept of HtmlDocument class which parses the HTML of the page
into an XPATH conformant document object that works "just like" XmlDocument.
-- Peter
Site: http://www.eggheadcafe.com
UnBlog: http://petesbloggerama.blogspot.com
MetaFinder: http://www.blogmetafinder.com


"Stan SR" wrote:

> Hi,
>
> I need to read a html file and parse it as a XML File.
>
> All my html file have this structure.
> <html>
> <head>
> <title>
> </title>
> <script language="javascript">
> </script>
> </head>
> <body>
> </body>
> </html>
>
> My code has to read some sections (title, script, body).
> Everything works when the script language (javascript code) section has not
> code or not a lot, but sometimes it fails when there are characters like ;
> (especially in "for" statement).
> So for that works, I had to add "decorate" the script section with
> <![CDATA[ ]]> and it looks like
>
> <script language="javascript">
> <![CDATA[
>
> ]]>
> </script>
>
> Is there a way to parse the file without using the <![CDATA[ ]]> tag ?
>
> Stan
>
>
>

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
optparse: parse v. parse! ?? 7stud -- Ruby 3 02-20-2008 05:20 AM
Preventing "collapse" of HTML tags during XML parse Rob Hunter Ruby 2 08-31-2007 01:48 PM
Different results parsing a XML file with XML::Simple (XML::Sax vs. XML::Parser) Erik Wasser Perl Misc 5 03-05-2006 10:09 PM
How to parse a string like C program parse the command line string? linzhenhua1205@163.com C Programming 19 03-15-2005 07:41 PM
How to parse a XML doc with HTML tags within the texts Francesco Moi XML 8 02-21-2005 01:40 PM



Advertisments