Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > XML > XSLT, HTML to XML, understanding external Website

Thread Tools

XSLT, HTML to XML, understanding external Website

Arne Pagel
Posts: n/a
Dear all,

currently I am searching for a concept for importing Information from an external website to my
xslt/php based lunch order system.
My current Idea is to filter this external website with xslt and convert the necessary information
to an xml file.

At the moment I am trying to import the weekly changing menu from a Restaurant.
The Problem is, that the Website of this restaurant is probably maintained through a web based CMS,
which means that the quality and consistency of the web page is not that high.

Main problem is that one important Information delimiter is the linefeed <BR> within normal text.
I am stuck at the point how I can react on <BR> Tags at normal node text.

Below you can find an extract of the original web-site.

With the current xslt Template an empty node filtering is done:

- - -
<xsl:template match="table/tr/td/div">
<xsl:if test=". != ''">
DIV:<xsl:value-of select="." /> <br/>
- - -

Now I want to add the following functionality:
- this Template should just work at a table which contains the phrase "Mittagstischkarte"
- The linefeed's <br> within the text should be Identified
- The Menues are just clearly separated by the price,
an number of the Format X.XX should be identified
- Rows with just formating content without real text (A-Z a-z 0-9) should be ignored

Do you think this can all be done with xlst?
It is also possible to do this in more templates with different calls from php, or to add some php
post / intermediate processing.

Here is the extract of the Original website (sorry, content is German)
- - -
<table width="100%" border="0" cellpadding="0" cellspacing="0">
<td width="30" height="552"></td>
<td width="529" valign="top">
<div align="center"><font size="4"><b>Mittagstischkarte</b></font><br><br><font
size="4"><font size="3">Unser wöchentlich wechselnder Mittagstisch</font></font> <br><font
size="4"><font size="3">von 12.00 bis 14.00 Uhr</font></font></div>
<div align="center"></div>
<div align="center"></div>
<div align="center"></div>
<div align="center"></div>
<div align="center"><font size="3"></font></div>
<div align="center"><font size="3"></font></div>
<div align="center"><font size="3"></font></div>
<div align="center"><font size="3"></font></div>
<div align="center"><font size="3"></font></div>
<div align="center"><font size="3"></font>&nbsp;</div>
<div align="center"><b><font size="3">"Eintopf der Woche"</font></b><br>Linseneintopf mit
Bockwurst<br>¤ 5,50 <br></div>
<div align="center"><font size="3"></font>&nbsp;</div>
<div align="center"><font size="6">Tagessuppe &nbsp; 1,50 ¤<br><br></font>&nbsp;<br></div>
<div align="center"><font size="3">Kasseler mit Sauerkraut und Kartoffelpüree<br><b>5,50
¤</b><br></font><br><font size="4"><font size="2">__________</font></font><font size="4"><br>kl.
Schnitzel mit Sauce nach Wahl,<br>Bratkartoffeln und Gemüse<br><br></font><font size="4"><b>5,50
<div align="center"></div>
<div align="center"></div>
<div align="center"></div>
<div align="center">______________</div>
<div align="center"></div>
<div align="center"></div>
<div align="center"></div>
<div align="center"><font size="4"></font></div>
<div align="center"><font size="4"></font></div><div align="center"><font size="4"
color="#f0f090">frische Bratwurst<br>mit Bratkartoffeln und Gemüse<br></font></div><div
<div align="center"><font size="4"></font></div>
<div align="center"><font size="4"></font></div>

<div align="center"><font size="4">5,50 ¤</font></div>
<div align="center">____________</div>
<div align="center"><font size="4"></font></div>
<div align="center"><font size="4">fruchtiges Hähnchengeschnetzeltes<br>im Reisrand mit
Salat<br>5,50 ¤<br>---------<br></font></div>
<div align="center"></div>
<div align="center"><font size="4">2 Spiegeleier<br>&nbsp;mit Salzkartoffeln und Blattspinat<br>5,50
<div align="center"><font size="4"></font></div>
<div align="center"></div>
<div align="center"><font size="4">_________</font><br><font
size="5"><br>Dessert&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1,50
¤</font><br><br><br><br></div><font size="4"><br></font>
<div align="center"><font size="4"><font size="4"></font></font></div><font
size="4">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp ;&nbsp;&nbsp;&nbsp;&nbsp; <br><br></font>
<div align="center"></div>
<div align="center"><font size="3"></font></div>
<div align="center"></div> </td>
<td width="30"></td>

This page is loaded via the DOM Function loadHTMLFile

- - -
Regards Arne

Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Autofill Website from External Program rob ASP .Net 1 01-06-2006 09:36 AM
Create references to external scipt files from within an external script file Mellow Crow Javascript 6 11-04-2005 01:16 PM
logging into a website on an external server MotorcycleIke ASP .Net 2 09-22-2005 12:48 PM
Help understanding HTML "Named Anchor" Jeff Wisnia Computer Support 3 04-06-2005 04:48 AM
unresolved external symbol/using an external dll Scott Allen C++ 8 05-02-2004 06:11 PM