Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Perl Misc (http://www.velocityreviews.com/forums/f67-perl-misc.html)
-   -   simple web site mapper (http://www.velocityreviews.com/forums/t881389-simple-web-site-mapper.html)

Dan Jacobson 07-24-2003 03:08 AM

simple web site mapper
 
Say, does this working simple web site mapper (vs. Eric Raymond's
extra large version) have any stuffing hanging out that perl novice me
ought to fix? Say, how does one do '/bin/sh -e' in perl? Would
rewriting it in python be as easy? Does perl have an internal "ls"
that could be called as easy? File::Find couldn't give me the ls -R
order I prefer I suppose. Goal: to use even less lines of code.

use strict;
require HTML::HeadParser;
my $dir=<~/mywebsite>; #where files are on my computer
my $name="Bob Blobkowski";
my $url='http://blobkowski.org/';
chdir $dir||die;
print <<EOF;
<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd"><HTML><TITLE>$name\'s site map</TITLE>
</HEAD><BODY><H1>$name\'s site map</H1>\n<P><A href="$url">$url</A> as of
EOF
system "date"; print '</P><HR><PRE>';
my $p = HTML::HeadParser->new; my $d;
open LS, "ls -R|"||die;
while(<LS>){ #order nicer than find(1)
chomp;
if(s@:$@@){$d=$_;next}
if(/\.(txt|html)$/){s@^@$d/@;s/..//;print "<A href=\"$_\">$_</A>\n";
if(/\.html$/){$p->parse_file($_);print "\t",$p->header('Title'),"\n";}}}
print "</PRE></BODY></HTML>";

David K. Wall 07-24-2003 03:11 PM

Re: simple web site mapper
 
Dan Jacobson <jidanni@jidanni.org> wrote:

> Say, does this working simple web site mapper (vs. Eric Raymond's
> extra large version) have any stuffing hanging out that perl
> novice me ought to fix?


My comments below are about style, because that's what really got my
attention.

> Say, how does one do '/bin/sh -e' in
> perl? Would rewriting it in python be as easy?


I don't know much about Python, I've only played with it a bit. But
it would *force* you to use whitespace, which might not be a bad
thing. :-) If you're interested, grab a copy and try it:
http://www.python.org

> Does perl have an
> internal "ls" that could be called as easy? File::Find couldn't
> give me the ls -R order I prefer I suppose. Goal: to use even
> less lines of code.


Forget using less lines of code unless it makes the program more
reliable and/or readable. If I had to revise the code below the very
first thing I would do would be to format it in a way that aids
comprehension instead of hindering it.

Ok, so it's not a long, complicated program. But even short programs
can benefit from a readable style. See 'perldoc perlstyle' for
suggestions.

> use strict;
> require HTML::HeadParser;
> my $dir=<~/mywebsite>; #where files are on my computer
> my $name="Bob Blobkowski";
> my $url='http://blobkowski.org/';
> chdir $dir||die;


There's a low-precedence 'or' that you can use, too.

chdir $dir or die "Cannot chdir to $dir: $!";


> print <<EOF;
><!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN"
> "http://www.w3.org/TR/html4/strict.dtd"><HTML><TITLE>$name\'s site

^^
It's not necessary to escape that single-quote after $name.

> map</TITLE>
></HEAD><BODY><H1>$name\'s site map</H1>\n<P><A href="$url">$url</A>
>as of
> EOF
> system "date"; print '</P><HR><PRE>';
> my $p = HTML::HeadParser->new; my $d;


$d? What's $d? I can figure it out from the code, but a more
descriptive name would be useful. If someone else needs to edit the
code they'll appreciate a longer name -- and you might, too, when you
come back to this program after a while and don't remember all the
details.

Newlines and spaces are not a scarce resource.

> open LS, "ls -R|"||die;
> while(<LS>){ #order nicer than find(1)
> chomp;
> if(s@:$@@){$d=$_;next}


Why use a non-standard delimiter for the substitution operator when
you don't need to? s/:$// is pretty clear; s@:$@@ is slightly
obfuscated, IMHO.

> if(/\.(txt|html)$/){s@^@$d/@;s/..//;print "<A
> href=\"$_\">$_</A>\n";


qq() would make that easier to read; you wouldn't need to escape the
double-quote each time. See 'perldoc perlop' for quote and quote-
like operators.

> if(/\.html$/){$p->parse_file($_);print
> "\t",$p->header('Title'),"\n";}}}
> print "</PRE></BODY></HTML>";
>


Just because you *can* eliminate whitespace doesn't mean you
*should*.

That may be why no-one else has responded. (No-one had when I posted
this, anyway)

As for bugs, I haven't looked -- I was distracted too much by the
style.

--
David Wall

Steve Grazzini 07-24-2003 04:25 PM

Re: simple web site mapper
 
David K. Wall <usenet@dwall.fastmail.fm> wrote:
> Dan Jacobson <jidanni@jidanni.org> wrote:
>
> > Does perl have an internal "ls" that could be called as easy?
> > File::Find couldn't give me the ls -R order I prefer I suppose.


Have a look at File::Find's "preprocess" option.

% perldoc File::Find

> > use strict;
> > require HTML::HeadParser;
> > my $dir=<~/mywebsite>; #where files are on my computer
> > my $name="Bob Blobkowski";
> > my $url='http://blobkowski.org/';
> > chdir $dir||die;

>
> There's a low-precedence 'or' that you can use, too.
>
> chdir $dir or die "Cannot chdir to $dir: $!";


He'll *need* to use the low-precedence version or add some
parentheses.

> > open LS, "ls -R|"||die;


Same thing here:

% perl -MO=Deparse -e 'open LS, "ls -R|" || die'
open LS, 'ls -R|';

--
Steve

Jeff 'japhy' Pinyan 07-24-2003 06:18 PM

Re: simple web site mapper
 
On Thu, 24 Jul 2003, David K. Wall wrote:

>Dan Jacobson <jidanni@jidanni.org> wrote:
>
>> print <<EOF;
>><!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN"
>> "http://www.w3.org/TR/html4/strict.dtd"><HTML><TITLE>$name\'s site

> ^^
>It's not necessary to escape that single-quote after $name.


Yes it is. $name's is pre-Perl 5 syntax for $name::s. It's something
that bites many Perl programmers from time to time.

--
Jeff Pinyan RPI Acacia Brother #734 2003 Rush Chairman
"And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler)
years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election
Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus)


David K. Wall 07-24-2003 06:54 PM

Re: simple web site mapper
 
Jeff 'japhy' Pinyan <pinyaj@rpi.edu> wrote:

> On Thu, 24 Jul 2003, David K. Wall wrote:
>
>>Dan Jacobson <jidanni@jidanni.org> wrote:
>>
>>> print <<EOF;
>>><!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.01//EN"
>>> "http://www.w3.org/TR/html4/strict.dtd"><HTML><TITLE>$name\'s
>>> site

>> ^^
>>It's not necessary to escape that single-quote after $name.

>
> Yes it is. $name's is pre-Perl 5 syntax for $name::s. It's
> something that bites many Perl programmers from time to time.


Damn. I knew about the pre-Perl 5 syntax, but a quick
copy/paste/edit/run of code from the OP convinced me that it didn't
matter any more. But there are *two* places in the here-doc that use
an escape quote. I edited one and saw the output from the other.



All times are GMT. The time now is 09:02 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.