Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Return HTML between tags with HTML::TokeParser ?

Reply
Thread Tools

Return HTML between tags with HTML::TokeParser ?

 
 
Maqo
Guest
Posts: n/a
 
      02-23-2005
Is it possible to use HTML::TokeParser to return the raw HTML between
two <A> tags, as opposed to just the text? My source file contains
several blocks of code--containing anchor links for each--that I'm
trying to extract by section while maintaining formatting.

My code:

my $p = HTML::TokeParser->new("file.txt" || die "Can't open file.");
while (my $t = $p->get_tag("a")) {
my $name = $t->[1]{name};
next unless $name && ($name eq "anchor");
print "$name : " . $p->get_text("a");

Example HTML source:

<A NAME='anchor1'></a><p>Some text and HTML formatting</p><BR>
<A NAME='anchor2'></a><p>Some text and HTML formatting</p><BR>
....
<A NAME='anchor10'></a><p>Some text and HTML formatting</p><BR>

The above code returns the "text and formatting" portions nicely,
albeit only as text. Is there an easy way to do this using
HTML:arser to return the desired portion, with HTML markup included?
Many thanks.

 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      02-23-2005
"Maqo" <(E-Mail Removed)> wrote in news:1109119459.537290.141800
@c13g2000cwb.googlegroups.com:

> Is it possible to use HTML::TokeParser to return the raw HTML between
> two <A> tags, as opposed to just the text? My source file contains
> several blocks of code--containing anchor links for each--that I'm
> trying to extract by section while maintaining formatting.
>
> My code:
>
> my $p = HTML::TokeParser->new("file.txt" || die "Can't open file.");


Cute but counter-productive. Please post real code.

> while (my $t = $p->get_tag("a")) {
> my $name = $t->[1]{name};
> next unless $name && ($name eq "anchor");
> print "$name : " . $p->get_text("a");
>
> Example HTML source:
>
> <A NAME='anchor1'></a><p>Some text and HTML formatting</p><BR>


Am I missing something here? There is no text between <a> and </a>
above.

> The above code returns the "text and formatting" portions nicely,
> albeit only as text.


Once the bugs are fixed, the code above runs successfully and produces
no output at all. That is exactly what I expected to see based on the
sample data you provided. Problem solved.

Hvae you read the posting guidelines?

Sinan
 
Reply With Quote
 
 
 
 
Michael Wagg
Guest
Posts: n/a
 
      02-23-2005
A. Sinan Unur wrote:

>>my $p = HTML::TokeParser->new("file.txt" || die "Can't open file.");

>
> Cute but counter-productive. Please post real code.


With the exception of the input filename (which was changed from
"digest.html"), this is the exact code being used.

>>while (my $t = $p->get_tag("a")) {
>>my $name = $t->[1]{name};
>>next unless $name && ($name eq "anchor");
>>print "$name : " . $p->get_text("a");
>>
>>Example HTML source:
>>
>><A NAME='anchor1'></a><p>Some text and HTML formatting</p><BR>

>
>
> Am I missing something here? There is no text between <a> and </a>
> above.


The above code returns the text between one open tag and the next open
tag (<A> -> <A>), not between one open tag and the subsequent closing
tag (<A> -> </A>).
 
Reply With Quote
 
Sam Holden
Guest
Posts: n/a
 
      02-23-2005
On Wed, 23 Feb 2005 01:50:02 GMT, Michael Wagg <(E-Mail Removed)> wrote:
> A. Sinan Unur wrote:
>
>>>my $p = HTML::TokeParser->new("file.txt" || die "Can't open file.");

>>
>> Cute but counter-productive. Please post real code.

>
> With the exception of the input filename (which was changed from
> "digest.html"), this is the exact code being used.


That's a really silly || with a constant true value on the left.

Why would you bother with code that can not be executed? Especially
when all it could possibly serve to do is to trick other people,
and perhaps yourself, into thinking there's error checking when
there isn't.

--
Sam Holden
 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      02-23-2005
Michael Wagg <(E-Mail Removed)> wrote in news:ejRSd.9825$rB3.2454645
@twister.nyc.rr.com:

> A. Sinan Unur wrote:
>
>>>my $p = HTML::TokeParser->new("file.txt" || die "Can't open file.");

>>
>> Cute but counter-productive. Please post real code.

>
> With the exception of the input filename (which was changed from
> "digest.html"), this is the exact code being used.


my $p = HTML::TokeParser->new("file.txt")
or "Can't open file.";

>>>while (my $t = $p->get_tag("a")) {
>>>my $name = $t->[1]{name};
>>>next unless $name && ($name eq "anchor");


Now I realize why it doesn't return anything: There are no anchors named
'anchor' in the data you provided.

Sorry, I don't have time to look at the rest of the stuff right now.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
replacing tags between tags beartiger@gmail.com Perl Misc 9 09-19-2005 02:32 AM
All style tags after the first 30 style tags on an HTML page are not applied in Internet Explorer Rob Nicholson ASP .Net 3 05-28-2005 03:11 PM
what value does lack of return or empty "return;" return Greenhorn C Programming 15 03-06-2005 08:19 PM
html tags within meta tags allowed? Donald Firesmith XML 5 01-08-2005 11:29 PM
RegEx to find CFML tags nested in HTML tags Dean H. Saxe Perl 0 01-03-2004 06:11 PM



Advertisments