Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > HTML whitespace/commnets cruncher

Reply
Thread Tools

HTML whitespace/commnets cruncher

 
 
Garry Heaton
Guest
Posts: n/a
 
      10-19-2003
Can anyone recommend a perl script for crunching HTML whitespace and
comments? I wish to make duplicates of HTML files for uploading.

Garry Heaton

 
Reply With Quote
 
 
 
 
ko
Guest
Posts: n/a
 
      10-19-2003
Garry Heaton wrote:
> Can anyone recommend a perl script for crunching HTML whitespace and
> comments? I wish to make duplicates of HTML files for uploading.
>
> Garry Heaton
>


Use one of the HTML parsing modules. For example:

http://search.cpan.org/~gaas/HTML-Parser-3.33/

Download and unpack the distribution, and check out the example scripts
in the 'eg' directory.

HTH - keith

 
Reply With Quote
 
 
 
 
Eric J. Roode
Guest
Posts: n/a
 
      10-19-2003
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Garry Heaton <(E-Mail Removed)> wrote in news:zOtkb.12373$kA.3236929
@wards.force9.net:

> Can anyone recommend a perl script for crunching HTML whitespace and
> comments? I wish to make duplicates of HTML files for uploading.


Why not gzip the html files? Seems to me that'd be even better.


A quick google search turned up a couple freeware and commercial HTML
strippers. And I seem to recall there's an apache module that does it, but
I'm not sure.

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP5KTlWPeouIeTNHoEQLWUACgpXEJ99HvToQI6liJHMN5tB LWYZMAoKoH
3r8JlfmJtxcwvovr3YPz1/YD
=MhE2
-----END PGP SIGNATURE-----
 
Reply With Quote
 
Gregory Toomey
Guest
Posts: n/a
 
      10-20-2003
It was a dark and stormy night, and Garry Heaton managed to scribble:

> Can anyone recommend a perl script for crunching HTML whitespace and
> comments? I wish to make duplicates of HTML files for uploading.
>
> Garry Heaton


Would you believe I saw some code yesterday on the net that did this but now I cant find it.

The basic algorithm used regular expressions and was only a few lines long:
convert consecutive whitespace characters to single whitespace
remove whitespace from the beginning of lines
conver consecutive newlines to a single newline

gtoomey


 
Reply With Quote
 
Tad McClellan
Guest
Posts: n/a
 
      10-20-2003
Andrew Shitov <(E-Mail Removed)> wrote:

> Look at the code on this page: http://webcode.ru/cgi/despace1/



It has several bugs in it.

It open()s FILE, but never reads from it.

It uses ampersand on function calls when it does not want the
semantics the go with using ampersand on function calls.

It will mangle spaces in <pre> sections.


--
Tad McClellan SGML consulting
http://www.velocityreviews.com/forums/(E-Mail Removed) Perl programming
Fort Worth, Texas
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
firefox html, my downloaded html and firebug html different? Adam Akhtar Ruby 9 08-16-2008 07:55 PM
C source cruncher wanted David Given C Programming 7 10-17-2005 04:01 PM
How do I identify word<html><html>other word? Laura Perl 1 06-04-2004 11:32 PM
how to redirect to a frames-based html page and load the right html when coming from an ASP.NET page Mark Kamoski ASP .Net 1 08-13-2003 05:51 AM
How to use HTML::Parser to remove HTML tags and print result Mitchua Perl 1 07-15-2003 02:02 PM



Advertisments