Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Extracing data from XHTML file into another

Reply
Thread Tools

Extracing data from XHTML file into another

 
 
chris_huh
Guest
Posts: n/a
 
      05-26-2009
Is there a way to extract data from one xhtml file and create another
one with it. I want to create a basic file with all the headlines from
a news page listed in it (like an rss feed).
 
Reply With Quote
 
 
 
 
Martin Honnen
Guest
Posts: n/a
 
      05-26-2009
chris_huh wrote:
> Is there a way to extract data from one xhtml file and create another
> one with it. I want to create a basic file with all the headlines from
> a news page listed in it (like an rss feed).


Well XHTML is supposed to be XML so in theory you should be able to use
any XML parser respectively XML API to extract data, like XPath, XSLT,
XQuery. In practice however lots of XHTML is served as text/html and is
often not well-formed XML so XML parsers might fail to process it.

--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/
 
Reply With Quote
 
 
 
 
Joe Kesselman
Guest
Posts: n/a
 
      05-26-2009
Martin Honnen wrote:
> Well XHTML is supposed to be XML so in theory you should be able to use
> any XML parser respectively XML API to extract data, like XPath, XSLT,
> XQuery. In practice however lots of XHTML is served as text/html and is
> often not well-formed XML so XML parsers might fail to process it.


Even if served as text/html, appropriate software could recognize and
process it as XHTML and thus XML. You'd have to either know to expect
XHTML or have a prebuffer/prescan pass to check that.

Not that if it isn't well-formed XML, it really isn't XHTML, no matter
what the document's contents claim. That's one of the major differences
between XHTML and HTML -- HTML was SGML-based and allowed some
shortcuts/sloppiness that the XML-based XHTML doesn't.
 
Reply With Quote
 
chris_huh
Guest
Posts: n/a
 
      05-26-2009
On 26 May, 13:23, Joe Kesselman <(E-Mail Removed)>
wrote:
> Martin Honnen wrote:
> > Well XHTML is supposed to be XML so in theory you should be able to use
> > any XML parser respectively XML API to extract data, like XPath, XSLT,
> > XQuery. In practice however lots of XHTML is served as text/html and is
> > often not well-formed XML so XML parsers might fail to process it.

>
> Even if served as text/html, appropriate software could recognize and
> process it as XHTML and thus XML. You'd have to either know to expect
> XHTML or have a prebuffer/prescan pass to check that.
>
> Not that if it isn't well-formed XML, it really isn't XHTML, no matter
> what the document's contents claim. That's one of the major differences
> between XHTML and HTML -- HTML was SGML-based and allowed some
> shortcuts/sloppiness that the XML-based XHTML doesn't.


I have tried using a server-based approach (using the tutorial from
the w3schools sites - http://www.w3schools.com/xsl/xsl_server.asp) but
it doesn't seem to accept the xhtml file (the asp file just
continously loads). Is this taking the wrong tactic. I don't know much
about xml.

What i have is an xhtml file (with a .shtml extension) which is used
as an index page for a section in a news site (i have ten sections).
And i want another file made making an unordered list of the items on
that index page. On each index page there are three top stories, and i
want another file created that holds these three stories in a <ul>.
These 10 generated files (there will be one for each section) will
then be included in the top index page. Does that explain it well?
 
Reply With Quote
 
Martin Honnen
Guest
Posts: n/a
 
      05-26-2009
chris_huh wrote:

> I have tried using a server-based approach (using the tutorial from
> the w3schools sites - http://www.w3schools.com/xsl/xsl_server.asp) but
> it doesn't seem to accept the xhtml file (the asp file just
> continously loads). Is this taking the wrong tactic. I don't know much
> about xml.
>
> What i have is an xhtml file (with a .shtml extension) which is used
> as an index page for a section in a news site (i have ten sections).


Aren't .shtml files usually ones making use of SSI (server-side
includes)? Are you loading the document from the file system or over
HTTP? Can you post the URL to a sample document?

--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/
 
Reply With Quote
 
chris_huh
Guest
Posts: n/a
 
      05-26-2009
On 26 May, 14:06, Martin Honnen <(E-Mail Removed)> wrote:
> chris_huh wrote:
> > I have tried using a server-based approach (using the tutorial from
> > the w3schools sites -http://www.w3schools.com/xsl/xsl_server.asp) but
> > it doesn't seem to accept the xhtml file (the asp file just
> > continously loads). Is this taking the wrong tactic. I don't know much
> > about xml.

>
> > What i have is an xhtml file (with a .shtml extension) which is used
> > as an index page for a section in a news site (i have ten sections).

>
> Aren't .shtml files usually ones making use of SSI (server-side
> includes)? Are you loading the document from the file system or over
> HTTP? Can you post the URL to a sample document?
>
> --
>
> * * * * Martin Honnen
> * * * *http://msmvps.com/blogs/martin_honnen/


Yeah, i am using ssi to include other files in .shtml files. I cant
sent a link as i am using it on a closed server.

I am using:

<!--#include virtual="/includes/navigation.sssi" -->

to include the files.i guess that means it is over HTTP. The idea (if
this is even possible) is to include each of these created files using
this similar function.
 
Reply With Quote
 
Martin Honnen
Guest
Posts: n/a
 
      05-26-2009
chris_huh wrote:

> Yeah, i am using ssi to include other files in .shtml files. I cant
> sent a link as i am using it on a closed server.
>
> I am using:
>
> <!--#include virtual="/includes/navigation.sssi" -->
>
> to include the files.i guess that means it is over HTTP.


If you use SSI then reading a file from the file system would not
process any of those SSI instructions.

It is hard to tell what goes wrong without being able to check the
X(HT)ML documents you have. If you use classic ASP and have troubles
getting your code to work then you might want to ask in a newsgroup
dedicated to that: microsoft.public.inetserver.asp.general


--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/
 
Reply With Quote
 
chris_huh
Guest
Posts: n/a
 
      05-26-2009
On May 26, 2:41*pm, Martin Honnen <(E-Mail Removed)> wrote:
> chris_huh wrote:
> > Yeah, i am using ssi to include other files in .shtml files. I cant
> > sent a link as i am using it on a closed server.

>
> > I am using:

>
> > <!--#include virtual="/includes/navigation.sssi" -->

>
> > to include the files.i guess that means it is over HTTP.

>
> If you use SSI then reading a file from the file system would not
> process any of those SSI instructions.
>
> It is hard to tell what goes wrong without being able to check the
> X(HT)ML documents you have. If you use classic ASP and have troubles
> getting your code to work then you might want to ask in a newsgroup
> dedicated to that: microsoft.public.inetserver.asp.general
>
> --
>
> * * * * Martin Honnen
> * * * *http://msmvps.com/blogs/martin_honnen/


Yeah, i suppose it could be more of an issue with asp. The asp code i
use is:

<%
'Load XML
set xml = Server.CreateObject("Microsoft.XMLDOM")
xml.async = false
xml.load(Server.MapPath("/iraq/index.shtml"))

'Load XSL
set xsl = Server.CreateObject("Microsoft.XMLDOM")
xsl.async = false
xsl.load(Server.MapPath("/includes/style.xsl"))

'Transform file
Response.Write(xml.transformNode(xsl))
%>

When the Server.MapPath is a .xml file it works ok, but when it is
a .shtml file, loading from the actual file itself it just crashes. It
could be something wrong with the shtml file (maybe it isn't correct
xhtml) or maybe you can't use an shtml file. Thats what i wasn't sure
about.
 
Reply With Quote
 
Martin Honnen
Guest
Posts: n/a
 
      05-26-2009
chris_huh wrote:

> Yeah, i suppose it could be more of an issue with asp. The asp code i
> use is:
>
> <%
> 'Load XML
> set xml = Server.CreateObject("Microsoft.XMLDOM")
> xml.async = false
> xml.load(Server.MapPath("/iraq/index.shtml"))



> When the Server.MapPath is a .xml file it works ok, but when it is
> a .shtml file, loading from the actual file itself it just crashes. It
> could be something wrong with the shtml file (maybe it isn't correct
> xhtml) or maybe you can't use an shtml file. Thats what i wasn't sure
> about.


You can check for parse errors with MSXML as follows, put that after the
load call:
If xml.parseError.errorCode <> 0 Then
Response.Write xml.parseError.reason
End If


--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/
 
Reply With Quote
 
chris_huh
Guest
Posts: n/a
 
      05-26-2009
On May 26, 3:42*pm, Martin Honnen <(E-Mail Removed)> wrote:
> chris_huh wrote:
> > Yeah, i suppose it could be more of an issue with asp. The asp code i
> > use is:

>
> > <%
> > 'Load XML
> > set xml = Server.CreateObject("Microsoft.XMLDOM")
> > xml.async = false
> > xml.load(Server.MapPath("/iraq/index.shtml"))
> > When the Server.MapPath is a .xml file it works ok, but when it is
> > a .shtml file, loading from the actual file itself it just crashes. It
> > could be something wrong with the shtml file (maybe it isn't correct
> > xhtml) or maybe you can't use an shtml file. Thats what i wasn't sure
> > about.

>
> You can check for parse errors with MSXML as follows, put that after the
> load call:
> * *If xml.parseError.errorCode <> 0 Then
> * * *Response.Write xml.parseError.reason
> * *End If
>
> --
>
> * * * * Martin Honnen
> * * * *http://msmvps.com/blogs/martin_honnen/


I tried that but the script just times out.

Also i tried to validate the .shtml file (which is xhtml) and the only
errors were some xml markup. I put in <headline> tags for each title
so that i could extract it and it says that they are wrong? Is that
not how you do this?
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
convert xhtml to another xhtml using xslt Usha2009 XML 0 12-20-2009 01:13 PM
Extracing data from webpage srinivasan srinivas Python 2 09-11-2008 12:00 PM
Linux, extracing symbol table to read core dump Johannes Bauer C Programming 2 11-08-2007 06:01 PM
extracing .PAC archives Travis Computer Information 2 07-17-2007 08:22 AM
VHDL and extracing equations buke2 VHDL 2 07-28-2004 02:14 PM



Advertisments