Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > Extracing data from XHTML file into another

Reply
Thread Tools

Extracing data from XHTML file into another

 
 
chris_huh
Guest
Posts: n/a
 
      05-26-2009
On 26 May, 18:06, Martin Honnen <(E-Mail Removed)> wrote:
> chris_huh wrote:
> > * * <xsl:for-each select="xhtml:html/xhtml:body/xhtml:item">
> > * * <li>
> > * * * <xsl:value-of select="xhtml:headline" />

>
> I think those headline elements are deep down inside the table so you
> either have to spell out the complete path or use
> * * * * *<xsl:value-of select="descendant:html:headline"/>
>
> --
>
> * * * * Martin Honnen
> * * * *http://msmvps.com/blogs/martin_honnen/


oh so i need to declare every single thing. I will have a go tomorrow
and see if that works.
thanks a lot.
 
Reply With Quote
 
 
 
 
chris_huh
Guest
Posts: n/a
 
      05-27-2009
On May 26, 6:58*pm, chris_huh <(E-Mail Removed)> wrote:
> On 26 May, 18:06, Martin Honnen <(E-Mail Removed)> wrote:
>
> > chris_huh wrote:
> > > * * <xsl:for-each select="xhtml:html/xhtml:body/xhtml:item">
> > > * * <li>
> > > * * * <xsl:value-of select="xhtml:headline" />

>
> > I think those headline elements are deep down inside the table so you
> > either have to spell out the complete path or use
> > * * * * *<xsl:value-of select="descendant:html:headline"/>

>
> > --

>
> > * * * * Martin Honnen
> > * * * *http://msmvps.com/blogs/martin_honnen/

>
> oh so i need to declare every single thing. I will have a go tomorrow
> and see if that works.
> thanks a lot.


Ah yes that works great. I've got all the headlines coming up in a
ul.

If it possible to extract the href value too. I was thinking something
like <a href="{descendant:html:href}"> but that doesn't work.

Thanks
 
Reply With Quote
 
 
 
 
Martin Honnen
Guest
Posts: n/a
 
      05-27-2009
chris_huh wrote:

> If it possible to extract the href value too. I was thinking something
> like <a href="{descendant:html:href}"> but that doesn't work.


XHTML does not have any 'href' element and I don't see any 'href'
elements in the markup you posted.
In XHTML the 'a' elements have a 'href' attribute so you could try
descendant:html:a/@href
if that is what you are looking for but in the sample you posted earlier
you only have
<a href="#">
so I am not sure that is what you are looking for.


--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/
 
Reply With Quote
 
chris_huh
Guest
Posts: n/a
 
      05-27-2009
On May 27, 11:22*am, Martin Honnen <(E-Mail Removed)> wrote:
> chris_huh wrote:
> > If it possible to extract the href value too. I was thinking something
> > like <a href="{descendant:html:href}"> but that doesn't work.

>
> XHTML does not have any 'href' element and I don't see any 'href'
> elements in the markup you posted.
> In XHTML the 'a' elements have a 'href' attribute so you could try
> * *descendant:html:a/@href
> if that is what you are looking for but in the sample you posted earlier
> you only have
> * *<a href="#">
> so I am not sure that is what you are looking for.
>
> --
>
> * * * * Martin Honnen
> * * * *http://msmvps.com/blogs/martin_honnen/


yes, sorry that is what i meant. The href attribute of the a element.

And that worked great.

Thanks for all your help, i think everything is working as i wanted
now. Although there is one more thing (but more of a cherry-on-top
sort of thing). At the moment i obviously have to use an asp page to
create the page. Which if i then try to include in a shtml file using
the SSI includes will not work as the shtml file isn't an ASP, so i
can't use shtml files.

Is there a way with XML that you can force it to make an external
file. So you have an input file, an XSL file, a file that does all the
transforming and then an output file (which could be .sssi). This
would just save me having to use .ASP pages for just this one thing.
But if that isn't possible, its no problem.
 
Reply With Quote
 
Martin Honnen
Guest
Posts: n/a
 
      05-27-2009
chris_huh wrote:

> Is there a way with XML that you can force it to make an external
> file. So you have an input file, an XSL file, a file that does all the
> transforming and then an output file (which could be .sssi). This
> would just save me having to use .ASP pages for just this one thing.


Your ASP uses MSXML objects with VBScript. VBScript can also be used in
Windows Script Host (WSH) files so instead of embedding your script code
in ASP you could write a .vbs file (e.g. prog.vbs) and execute that with
WSH (e.g. by doing 'cscript prog.vbs' at a command prompt). The main
change would be to use
Set xml = CreateObject(...)
instead
Set xml = Server.CreateObject(...)
and then obviously you would need to to write files instead of
Response.Writing stuff to the browser.
See MSDN for
WSH:http://msdn.microsoft.com/en-us/libr...3k(VS.85).aspx

If you need help with that then I suggest you find a VBScript newsgroup
on news.microsoft.com

Other options obviously would be to not use script languages but rather
more modern approaches like the .NET framework and its XML classes/APIs
to solve the problems. There are free Visual Studio Express editions for
VB.NET and C# where you would have the advantage of getting IDE support
like Intellisense to write your programs.

--

Martin Honnen
http://msmvps.com/blogs/martin_honnen/
 
Reply With Quote
 
chris_huh
Guest
Posts: n/a
 
      05-27-2009
On May 27, 12:26*pm, Martin Honnen <(E-Mail Removed)> wrote:
> chris_huh wrote:
> > Is there a way with XML that you can force it to make an external
> > file. So you have an input file, an XSL file, a file that does all the
> > transforming and then an output file (which could be .sssi). This
> > would just save me having to use .ASP pages for just this one thing.

>
> Your ASP uses MSXML objects with VBScript. VBScript can also be used in
> Windows Script Host (WSH) files so instead of embedding your script code
> in ASP you could write a .vbs file (e.g. prog.vbs) and execute that with
> WSH (e.g. by doing 'cscript prog.vbs' at a command prompt). The main
> change would be to use
> * *Set xml = CreateObject(...)
> instead
> * *Set xml = Server.CreateObject(...)
> and then obviously you would need to to write files instead of
> Response.Writing stuff to the browser.
> See MSDN for
> WSH:http://msdn.microsoft.com/en-us/libr...3k(VS.85).aspx
>
> If you need help with that then I suggest you find a VBScript newsgroup
> on news.microsoft.com
>
> Other options obviously would be to not use script languages but rather
> more modern approaches like the .NET framework and its XML classes/APIs
> to solve the problems. There are free Visual Studio Express editions for
> VB.NET and C# where you would have the advantage of getting IDE support
> like Intellisense to write your programs.
>
> --
>
> * * * * Martin Honnen
> * * * *http://msmvps.com/blogs/martin_honnen/


Thanks, i might look into that. Although it might be a bit overkill.

Thanks a lot.
 
Reply With Quote
 
Peter Flynn
Guest
Posts: n/a
 
      05-27-2009
chris_huh wrote:
> Is there a way to extract data from one xhtml file and create another
> one with it. I want to create a basic file with all the headlines from
> a news page listed in it (like an rss feed).


Pass it through XML Tidy to ensure it becomes XHTML, then use an XML
processor to extract the bits you want with XPath statements. This is a
form of screen-scraping and it is used quite extensively to extract
information like headlines to create RSS feeds and the like.

Suppose you test the document, and after being Tidy'd you find that the
headlines are all in H4 elements inside a DIV whose class is "news". In
an XSLT transformation you could write something like

<xsl:template match="/">
<html>
<head><title>Copied headlines</title></head>
<body>
<ul>
<xsl:apply-templates select="//div[@class='news']/h4"/>
</ul>
</body>
</html>
</xsl:template>

<xsl:template match="h4">
<li>
<xsl:value-of select="."/>
</li>
</xsl:template>

///Peter
--
XML FAQ: http://xml.silmaril.ie/
 
Reply With Quote
 
chris_huh
Guest
Posts: n/a
 
      05-28-2009
On May 27, 11:13*pm, Peter Flynn <(E-Mail Removed)> wrote:
> chris_huh wrote:
> > Is there a way to extract data from one xhtml file and create another
> > one with it. I want to create a basic file with all the headlines from
> > a news page listed in it (like an rss feed).

>
> Pass it through XML Tidy to ensure it becomes XHTML, then use an XML
> processor to extract the bits you want with XPath statements. This is a
> form of screen-scraping and it is used quite extensively to extract
> information like headlines to create RSS feeds and the like.
>
> Suppose you test the document, and after being Tidy'd you find that the
> headlines are all in H4 elements inside a DIV whose class is "news". In
> an XSLT transformation you could write something like
>
> <xsl:template match="/">
> * *<html>
> * * *<head><title>Copied headlines</title></head>
> * * *<body>
> * * * *<ul>
> * * * * *<xsl:apply-templates select="//div[@class='news']/h4"/>
> * * * *</ul>
> * * *</body>
> * *</html>
> </xsl:template>
>
> <xsl:template match="h4">
> * *<li>
> * * *<xsl:value-of select="."/>
> * *</li>
> </xsl:template>
>
> ///Peter
> --
> XML FAQ:http://xml.silmaril.ie/


That's what i had thought about doing but wasn't too sure on the
steps. I have got it working using a separate asp file now so i think
everything is fine.

Thanks
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
convert xhtml to another xhtml using xslt Usha2009 XML 0 12-20-2009 01:13 PM
Extracing data from webpage srinivasan srinivas Python 2 09-11-2008 12:00 PM
Linux, extracing symbol table to read core dump Johannes Bauer C Programming 2 11-08-2007 06:01 PM
extracing .PAC archives Travis Computer Information 2 07-17-2007 08:22 AM
VHDL and extracing equations buke2 VHDL 2 07-28-2004 02:14 PM



Advertisments