Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   XML (http://www.velocityreviews.com/forums/f32-xml.html)
-   -   Extracing data from XHTML file into another (http://www.velocityreviews.com/forums/t685598-extracing-data-from-xhtml-file-into-another.html)

chris_huh 05-26-2009 11:12 AM

Extracing data from XHTML file into another
 
Is there a way to extract data from one xhtml file and create another
one with it. I want to create a basic file with all the headlines from
a news page listed in it (like an rss feed).

Martin Honnen 05-26-2009 11:35 AM

Re: Extracing data from XHTML file into another
 
chris_huh wrote:
> Is there a way to extract data from one xhtml file and create another
> one with it. I want to create a basic file with all the headlines from
> a news page listed in it (like an rss feed).


Well XHTML is supposed to be XML so in theory you should be able to use
any XML parser respectively XML API to extract data, like XPath, XSLT,
XQuery. In practice however lots of XHTML is served as text/html and is
often not well-formed XML so XML parsers might fail to process it.

--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/

Joe Kesselman 05-26-2009 12:23 PM

Re: Extracing data from XHTML file into another
 
Martin Honnen wrote:
> Well XHTML is supposed to be XML so in theory you should be able to use
> any XML parser respectively XML API to extract data, like XPath, XSLT,
> XQuery. In practice however lots of XHTML is served as text/html and is
> often not well-formed XML so XML parsers might fail to process it.


Even if served as text/html, appropriate software could recognize and
process it as XHTML and thus XML. You'd have to either know to expect
XHTML or have a prebuffer/prescan pass to check that.

Not that if it isn't well-formed XML, it really isn't XHTML, no matter
what the document's contents claim. That's one of the major differences
between XHTML and HTML -- HTML was SGML-based and allowed some
shortcuts/sloppiness that the XML-based XHTML doesn't.

chris_huh 05-26-2009 12:33 PM

Re: Extracing data from XHTML file into another
 
On 26 May, 13:23, Joe Kesselman <keshlam.cat.nos...@verizon.net>
wrote:
> Martin Honnen wrote:
> > Well XHTML is supposed to be XML so in theory you should be able to use
> > any XML parser respectively XML API to extract data, like XPath, XSLT,
> > XQuery. In practice however lots of XHTML is served as text/html and is
> > often not well-formed XML so XML parsers might fail to process it.

>
> Even if served as text/html, appropriate software could recognize and
> process it as XHTML and thus XML. You'd have to either know to expect
> XHTML or have a prebuffer/prescan pass to check that.
>
> Not that if it isn't well-formed XML, it really isn't XHTML, no matter
> what the document's contents claim. That's one of the major differences
> between XHTML and HTML -- HTML was SGML-based and allowed some
> shortcuts/sloppiness that the XML-based XHTML doesn't.


I have tried using a server-based approach (using the tutorial from
the w3schools sites - http://www.w3schools.com/xsl/xsl_server.asp) but
it doesn't seem to accept the xhtml file (the asp file just
continously loads). Is this taking the wrong tactic. I don't know much
about xml.

What i have is an xhtml file (with a .shtml extension) which is used
as an index page for a section in a news site (i have ten sections).
And i want another file made making an unordered list of the items on
that index page. On each index page there are three top stories, and i
want another file created that holds these three stories in a <ul>.
These 10 generated files (there will be one for each section) will
then be included in the top index page. Does that explain it well?

Martin Honnen 05-26-2009 01:06 PM

Re: Extracing data from XHTML file into another
 
chris_huh wrote:

> I have tried using a server-based approach (using the tutorial from
> the w3schools sites - http://www.w3schools.com/xsl/xsl_server.asp) but
> it doesn't seem to accept the xhtml file (the asp file just
> continously loads). Is this taking the wrong tactic. I don't know much
> about xml.
>
> What i have is an xhtml file (with a .shtml extension) which is used
> as an index page for a section in a news site (i have ten sections).


Aren't .shtml files usually ones making use of SSI (server-side
includes)? Are you loading the document from the file system or over
HTTP? Can you post the URL to a sample document?

--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/

chris_huh 05-26-2009 01:14 PM

Re: Extracing data from XHTML file into another
 
On 26 May, 14:06, Martin Honnen <mahotr...@yahoo.de> wrote:
> chris_huh wrote:
> > I have tried using a server-based approach (using the tutorial from
> > the w3schools sites -http://www.w3schools.com/xsl/xsl_server.asp) but
> > it doesn't seem to accept the xhtml file (the asp file just
> > continously loads). Is this taking the wrong tactic. I don't know much
> > about xml.

>
> > What i have is an xhtml file (with a .shtml extension) which is used
> > as an index page for a section in a news site (i have ten sections).

>
> Aren't .shtml files usually ones making use of SSI (server-side
> includes)? Are you loading the document from the file system or over
> HTTP? Can you post the URL to a sample document?
>
> --
>
> * * * * Martin Honnen
> * * * *http://msmvps.com/blogs/martin_honnen/


Yeah, i am using ssi to include other files in .shtml files. I cant
sent a link as i am using it on a closed server.

I am using:

<!--#include virtual="/includes/navigation.sssi" -->

to include the files.i guess that means it is over HTTP. The idea (if
this is even possible) is to include each of these created files using
this similar function.

Martin Honnen 05-26-2009 01:41 PM

Re: Extracing data from XHTML file into another
 
chris_huh wrote:

> Yeah, i am using ssi to include other files in .shtml files. I cant
> sent a link as i am using it on a closed server.
>
> I am using:
>
> <!--#include virtual="/includes/navigation.sssi" -->
>
> to include the files.i guess that means it is over HTTP.


If you use SSI then reading a file from the file system would not
process any of those SSI instructions.

It is hard to tell what goes wrong without being able to check the
X(HT)ML documents you have. If you use classic ASP and have troubles
getting your code to work then you might want to ask in a newsgroup
dedicated to that: microsoft.public.inetserver.asp.general


--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/

chris_huh 05-26-2009 02:39 PM

Re: Extracing data from XHTML file into another
 
On May 26, 2:41*pm, Martin Honnen <mahotr...@yahoo.de> wrote:
> chris_huh wrote:
> > Yeah, i am using ssi to include other files in .shtml files. I cant
> > sent a link as i am using it on a closed server.

>
> > I am using:

>
> > <!--#include virtual="/includes/navigation.sssi" -->

>
> > to include the files.i guess that means it is over HTTP.

>
> If you use SSI then reading a file from the file system would not
> process any of those SSI instructions.
>
> It is hard to tell what goes wrong without being able to check the
> X(HT)ML documents you have. If you use classic ASP and have troubles
> getting your code to work then you might want to ask in a newsgroup
> dedicated to that: microsoft.public.inetserver.asp.general
>
> --
>
> * * * * Martin Honnen
> * * * *http://msmvps.com/blogs/martin_honnen/


Yeah, i suppose it could be more of an issue with asp. The asp code i
use is:

<%
'Load XML
set xml = Server.CreateObject("Microsoft.XMLDOM")
xml.async = false
xml.load(Server.MapPath("/iraq/index.shtml"))

'Load XSL
set xsl = Server.CreateObject("Microsoft.XMLDOM")
xsl.async = false
xsl.load(Server.MapPath("/includes/style.xsl"))

'Transform file
Response.Write(xml.transformNode(xsl))
%>

When the Server.MapPath is a .xml file it works ok, but when it is
a .shtml file, loading from the actual file itself it just crashes. It
could be something wrong with the shtml file (maybe it isn't correct
xhtml) or maybe you can't use an shtml file. Thats what i wasn't sure
about.

Martin Honnen 05-26-2009 02:42 PM

Re: Extracing data from XHTML file into another
 
chris_huh wrote:

> Yeah, i suppose it could be more of an issue with asp. The asp code i
> use is:
>
> <%
> 'Load XML
> set xml = Server.CreateObject("Microsoft.XMLDOM")
> xml.async = false
> xml.load(Server.MapPath("/iraq/index.shtml"))



> When the Server.MapPath is a .xml file it works ok, but when it is
> a .shtml file, loading from the actual file itself it just crashes. It
> could be something wrong with the shtml file (maybe it isn't correct
> xhtml) or maybe you can't use an shtml file. Thats what i wasn't sure
> about.


You can check for parse errors with MSXML as follows, put that after the
load call:
If xml.parseError.errorCode <> 0 Then
Response.Write xml.parseError.reason
End If


--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/

chris_huh 05-26-2009 02:57 PM

Re: Extracing data from XHTML file into another
 
On May 26, 3:42*pm, Martin Honnen <mahotr...@yahoo.de> wrote:
> chris_huh wrote:
> > Yeah, i suppose it could be more of an issue with asp. The asp code i
> > use is:

>
> > <%
> > 'Load XML
> > set xml = Server.CreateObject("Microsoft.XMLDOM")
> > xml.async = false
> > xml.load(Server.MapPath("/iraq/index.shtml"))
> > When the Server.MapPath is a .xml file it works ok, but when it is
> > a .shtml file, loading from the actual file itself it just crashes. It
> > could be something wrong with the shtml file (maybe it isn't correct
> > xhtml) or maybe you can't use an shtml file. Thats what i wasn't sure
> > about.

>
> You can check for parse errors with MSXML as follows, put that after the
> load call:
> * *If xml.parseError.errorCode <> 0 Then
> * * *Response.Write xml.parseError.reason
> * *End If
>
> --
>
> * * * * Martin Honnen
> * * * *http://msmvps.com/blogs/martin_honnen/


I tried that but the script just times out.

Also i tried to validate the .shtml file (which is xhtml) and the only
errors were some xml markup. I put in <headline> tags for each title
so that i could extract it and it says that they are wrong? Is that
not how you do this?


All times are GMT. The time now is 10:19 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.