Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > HTML > Table/table rows/table data tag question?

Reply
Thread Tools

Table/table rows/table data tag question?

 
 
Rio
Guest
Posts: n/a
 
      11-04-2004
Hi, my goal is to extract data from cells within tables from certain pages
(sportsbooks odds)!

I'm using java to achieve this, I get the source of the page, place it in a
string and pass that string (basically source of the html page) to methods
that cut it out sequentially.

First they find whatever it is between
<table.....any text including data, attributes and other tags up
until...../table>

Whatever is in there belongs to table 1 and that substring is cut out and
passed to another method that finds

<tr............anything..................../tr>

that's row 1, that substring is cut out and passed to another method that
finds

<td...............anything............../td> and finally strips other tags
and extracts data.

Finally every cell has its data, table number, row number and cell number.



The program works for the great majority of pages I'm trying to extract data
from. It obviously fails when it encounters table within table

<table (1).....
<table (2 within table 1)....
..........
.../table (2)>
/table (1)>

becuse it cuts from first <table opening tag until first /table> closing
tag. That's also a problem I can deal with.


NOW, THE PROBLEM!

But one particular page is giving me headache. I noticed my programm wrongly
counts cells, rows, misplaces data etc. I designed a method TO COUNT EACH
OCCURENCE OF opening and closing <table <tr and <td tags and found out that
NUMBER OF OPENING AND CLOSING TAGS IS NOT THE SAME and therefore I can't
design the programm that can correctly find what I want.

THE QUESTION IS: How is it possible and how does IE know where one table
(table row or cell) starts and where it ends and is it possible that some
<table <tr or <td tags actually only serve to describe attributes of that
table or row, if so how can I recognize them?


Big thanks to anyone who just reads this !











 
Reply With Quote
 
 
 
 
Jim Higson
Guest
Posts: n/a
 
      11-04-2004
So you are writing your own HTML parser?

Why not just use the provided ones? (I'm pretty sure there's one in the JRE
already, in javax.text or somewhere). This will deal with nested tables etc
for you.

If the pages are XHTML, you could even use a generic XML parser, such as
Xerces. One of the advantages of XHTML is easy parsing with generic tools.
 
Reply With Quote
 
 
 
 
rf
Guest
Posts: n/a
 
      11-04-2004
Rio wrote:

> THE QUESTION IS: How is it possible and how does IE know where one table
> (table row or cell) starts and where it ends and is it possible that some
> <table <tr or <td tags actually only serve to describe attributes of that
> table or row, if so how can I recognize them?


The closing tag for table rows and cells is optional. Browsers understand
this.

As you parse a td element if you encounter a <td> or a <tr> or a </tr> then
you *imply* a </td> for the td element. Rows are easier, as you parse the tr
element if you encounter a <tr> or a </table> then imply a </tr>.

--
Cheers
Richard.


 
Reply With Quote
 
Rio
Guest
Posts: n/a
 
      11-05-2004
Thanks a lot for the reply that's really helpful, how about table tag, are
they supposed to be closed properly, if yes how is it possible to have
unequal number of opening and closing table tags?


> The closing tag for table rows and cells is optional. Browsers understand
> this.
>
> As you parse a td element if you encounter a <td> or a <tr> or a </tr>

then
> you *imply* a </td> for the td element. Rows are easier, as you parse the

tr
> element if you encounter a <tr> or a </table> then imply a </tr>.
>
> --
> Cheers
> Richard.
>
>



 
Reply With Quote
 
rf
Guest
Posts: n/a
 
      11-05-2004
Rio wrote:

[top posting corrected]

> > The closing tag for table rows and cells is optional. Browsers

understand
> > this.
> >
> > As you parse a td element if you encounter a <td> or a <tr> or a </tr>

> then
> > you *imply* a </td> for the td element. Rows are easier, as you parse

the
> tr
> > element if you encounter a <tr> or a </table> then imply a </tr>.


> Thanks a lot for the reply that's really helpful, how about table tag,


Element. You are talking about the table *element*. It has an opening tag
and a closing tag and, between these, some content.

> are
> they supposed to be closed properly,


Check the specification.
http://www.w3.org/TR/html4/struct/ta...tml#edef-TABLE

It says there tat the end tag is required.

> if yes how is it possible to have
> unequal number of opening and closing table tags?


The spec says the table element must be closed. This does not mean that
authors *will* close them. The result will be up to browser error
correction.

--
Cheers
Richard.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how do u invoke Tag b's Tag Handler from within Tag a's tag Handler? shruds Java 1 01-27-2006 03:00 AM
To vlan tag or not to tag? budyerr Cisco 1 07-08-2004 03:45 AM
using param or out tag inside sql tag (jsp/jstl/tomcat) shahbaz Java 0 10-27-2003 02:46 AM
struts tag inside a tag kishan bisht Java 1 07-08-2003 11:04 PM
How to embed the <jsp:plugin> tag into a tag handler class...HELP !! jstack Java 1 07-04-2003 06:58 PM



Advertisments