![]() |
Whitespace removal in html generated by cgi
A few weeks ago a question was asked in this group about removing whitespace from html, in particular from html generated by cgi.
Here's a simple technique I developed for Linux: 1. A sample cgi. Bash uses the <<'delimiter' conststuct to pass the input verbatim to Perl. The output of the cgi is piped to delspace.pl. our whitespace munger. #!/bin/bash /usr/bin/perl <<'EOFPERL' | ./delspace.pl #your cgi goes here use strict; $|++; print "Content-type:text/html\n\n"; print " <h1> This is a test <h1> \n"; print " some more text\n"; EOFPERL 2. Now here's delspace.pl, the whitespace remover. It may be a little buggy, but it seems to work for my simple html. #!/usr/bin/perl my $count=0; while(<>){ # remove trailing whitespace s/^\s+//; # remove leading whitespace s/\s+$//; # change internal whitespace to single space s/\s+/ /g; # remove simple one line comments s/<!--.*?-->//; # another simple whitespace removal s/> </></g; #newlines are not needed #except for Content-type-text/html\n\n # which occurs at the start print; print "\n" if $count++<4; } gtoomey |
Re: Whitespace removal in html generated by cgi
[please limit your line lengths to 72 characters]
[please make sure your blank lines are *actually* blank] Gregory Toomey <nospam@bigpond.com> wrote: > A few weeks ago a question was asked in this group about removing > whitespace from html, in particular from html generated by cgi. > Here's a simple technique I developed for Linux: > > 1. A sample cgi. Bash uses the <<'delimiter' conststuct to pass the > input verbatim to Perl. The output of the cgi is piped to > delspace.pl. our whitespace munger. > > #!/bin/bash There is absolutely no need to use bash. If nothing better, use the techniques described in perldoc perlipc "Safe Pipe Opens". Better, use a tied filehandle or a PerlIO layer on STDOUT. Or simply generate the thing without superflous whitespace in the first place. <snip> > 2. Now here's delspace.pl, the whitespace remover. It may be a > little buggy, but it seems to work for my simple html. > > #!/usr/bin/perl > my $count=0; > while(<>){ > # remove trailing whitespace > s/^\s+//; > > # remove leading whitespace > s/\s+$//; > > # change internal whitespace to single space > s/\s+/ /g; > > # remove simple one line comments > s/<!--.*?-->//; > > # another simple whitespace removal > s/> </></g; You realise this changes the presentation of the HTML? > #newlines are not needed > #except for Content-type-text/html\n\n > # which occurs at the start > print; > print "\n" if $count++<4; Why 4? > } 'A little buggy'? The whole idea's fundamentally flawed: you need to start by separating the HTTP from the HTML from the data, which means using an HTML parsing module. For instance, what about this: <link rel=stylesheet type="text/css" href="..."/> Or this: Status: 302 Found Location: ... Content-encoding: ... Content-type: text/html Content-length: ... <html>... Or this: <pre> #!/usr/bin/perl use warnings; use strict; print "Hello world\n"; </pre> Ben -- I've seen things you people wouldn't believe: attack ships on fire off the shoulder of Orion; I've watched C-beams glitter in the darkness near the Tannhauser Gate. All these moments will be lost, in time, like tears in rain. Time to die. |-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-| ben@morrow.me.uk |
Re: Whitespace removal in html generated by cgi
It was a dark and stormy night, and Ben Morrow managed to scribble:
> [please limit your line lengths to 72 characters] > [please make sure your blank lines are *actually* blank] > > Gregory Toomey <nospam@bigpond.com> wrote: >> A few weeks ago a question was asked in this group about removing >> whitespace from html, in particular from html generated by cgi. >> Here's a simple technique I developed for Linux: >> >> 1. A sample cgi. Bash uses the <<'delimiter' conststuct to pass the >> input verbatim to Perl. The output of the cgi is piped to >> delspace.pl. our whitespace munger. >> >> #!/bin/bash > > There is absolutely no need to use bash. If nothing better, use the > techniques described in perldoc perlipc "Safe Pipe Opens". Better, use > a tied filehandle or a PerlIO layer on STDOUT. Or simply generate the > thing without superflous whitespace in the first place. > The technique I described allows you to take an existing cgi & change 2 lines at the top & one at the bottom. What you described will work, but its more complicated. > <snip> >> 2. Now here's delspace.pl, the whitespace remover. It may be a >> little buggy, but it seems to work for my simple html. >> >> #!/usr/bin/perl >> my $count=0; >> while(<>){ >> # remove trailing whitespace >> s/^\s+//; >> >> # remove leading whitespace >> s/\s+$//; >> >> # change internal whitespace to single space >> s/\s+/ /g; >> >> # remove simple one line comments >> s/<!--.*?-->//; >> >> # another simple whitespace removal >> s/> </></g; > > You realise this changes the presentation of the HTML? > >> #newlines are not needed >> #except for Content-type-text/html\n\n >> # which occurs at the start >> print; >> print "\n" if $count++<4; > > Why 4? > >> } > > 'A little buggy'? The whole idea's fundamentally flawed: you need to > start by separating the HTTP from the HTML from the data, which means > using an HTML parsing module. For instance, what about this: > It worked with all the cgis I've created. Its just a simple pragmatic way to solve a real world problem . gtoomey |
Re: Whitespace removal in html generated by cgi
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Gregory Toomey <nospam@bigpond.com> wrote in news:1933712.m1tGeoNVPB@gregs-web-hosting-and-pickle-farming: > A few weeks ago a question was asked in this group about removing > whitespace from html, in particular from html generated by cgi. Here's > a simple technique I developed for Linux: What is the goal of this? Reducing the amount of data that is transmitted to the client browser? If so, you would probably be better off compressing the output with gzip -- all major browsers support gzip compressed data. [...] > #newlines are not needed > #except for Content-type-text/html\n\n > # which occurs at the start > print; > print "\n" if $count++<4; Newlines are needed in <pre>...</pre> sections, and sometimes in <textarea>...</textarea> sections. - -- Eric $_ = reverse sort $ /. r , qw p ekca lre uJ reh ts p , map $ _. $ " , qw e p h tona e and print -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com> iQA/AwUBP7f0GWPeouIeTNHoEQKoQACg4qJhX/JKb6y7ZCOK9eiMVqXih9EAn2px YT5a72WavpE6GErYnLOzUQ+d =zRRz -----END PGP SIGNATURE----- |
Re: Whitespace removal in html generated by cgi
On Sun, 16 Nov 2003, Eric J. Roode wrote:
>> #newlines are not needed >> #except for Content-type-text/html\n\n >> # which occurs at the start >> print; >> print "\n" if $count++<4; > >Newlines are needed in <pre>...</pre> sections, and sometimes in ><textarea>...</textarea> sections. Not to mention that, although most HTML renders multiple whitespace as a SINGLE space, a SINGLE newline IS needed, because the browser will render it as a space. That is, "foo\nbar" is rendered as "foo bar", while a string like "foo \n bar" is also just rendered as "foo bar". -- Jeff Pinyan RPI Acacia Brother #734 2003 Rush Chairman "And I vos head of Gestapo for ten | Michael Palin (as Heinrich Bimmler) years. Ah! Five years! Nein! No! | in: The North Minehead Bye-Election Oh. Was NOT head of Gestapo AT ALL!" | (Monty Python's Flying Circus) |
Re: Whitespace removal in html generated by cgi
It was a dark and stormy night, and Eric J. Roode managed to scribble:
> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Gregory Toomey <nospam@bigpond.com> wrote in > news:1933712.m1tGeoNVPB@gregs-web-hosting-and-pickle-farming: > >> A few weeks ago a question was asked in this group about removing >> whitespace from html, in particular from html generated by cgi. Here's >> a simple technique I developed for Linux: > > What is the goal of this? Reducing the amount of data that is > transmitted to the client browser? Yes. >If so, you would probably be better > off compressing the output with gzip -- all major browsers support gzip > compressed data. Yes I use Apache with gzip so that's another level of compression. People hate waiting for pages to load, especially for people on dialup. > > [...] >> #newlines are not needed >> #except for Content-type-text/html\n\n >> # which occurs at the start >> print; >> print "\n" if $count++<4; > > Newlines are needed in <pre>...</pre> sections, and sometimes in > <textarea>...</textarea> sections. > > - -- > Eric > $_ = reverse sort $ /. r , qw p ekca lre uJ reh > ts p , map $ _. $ " , qw e p h tona e and print > > -----BEGIN PGP SIGNATURE----- > Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com> > > iQA/AwUBP7f0GWPeouIeTNHoEQKoQACg4qJhX/JKb6y7ZCOK9eiMVqXih9EAn2px > YT5a72WavpE6GErYnLOzUQ+d > =zRRz > -----END PGP SIGNATURE----- |
Re: Whitespace removal in html generated by cgi
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Jeff 'japhy' Pinyan <pinyaj@rpi.edu> wrote in news:Pine.SGI.3.96.1031116171158.181912A-100000@vcmr-64.server.rpi.edu: > Not to mention that, although most HTML renders multiple whitespace as a > SINGLE space, a SINGLE newline IS needed, because the browser will render > it as a space. That is, "foo\nbar" is rendered as "foo bar", while a > string like "foo \n bar" is also just rendered as "foo bar". Ooh, good point. - -- Eric $_ = reverse sort $ /. r , qw p ekca lre uJ reh ts p , map $ _. $ " , qw e p h tona e and print -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com> iQA/AwUBP7gZY2PeouIeTNHoEQJuPwCePA4BQ8lKxNoFVeJK7PeCK7 vOgaUAn1xC xlc/HAuS24OiXl9X1RTYqVPZ =iONd -----END PGP SIGNATURE----- |
Re: Whitespace removal in html generated by cgi
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 Gregory Toomey <nospam@bigpond.com> wrote in news:3072218.31r3eYUQgx@gregs- web-hosting-and-pickle-farming: > > People hate waiting for pages to load, especially for people on dialup. Have you verified that the extra time your CGI scripts take to execute is less than the transfer time of the spaces you are eliminating? :-) - -- Eric $_ = reverse sort $ /. r , qw p ekca lre uJ reh ts p , map $ _. $ " , qw e p h tona e and print -----BEGIN PGP SIGNATURE----- Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com> iQA/AwUBP7gZyWPeouIeTNHoEQJc6QCfRsU9IVVvuPbf1LCJ65Ot7K +TVJUAnRXm MizOFx2ThfFeAocFzgE/LLZ/ =fWE0 -----END PGP SIGNATURE----- |
Re: Whitespace removal in html generated by cgi
It was a dark and stormy night, and Eric J. Roode managed to scribble:
> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Gregory Toomey <nospam@bigpond.com> wrote in > news:3072218.31r3eYUQgx@gregs- web-hosting-and-pickle-farming: > >> >> People hate waiting for pages to load, especially for people on dialup. > > Have you verified that the extra time your CGI scripts take to execute is > less than the transfer time of the spaces you are eliminating? :-) > The server I use for cgi is about 2.6GHz and averages 20% CPU utilisation. Running the script to remove whitespace takes under 1 second for 1000 lines of HTML, and does not increase the load to any discernable extent. The database-driven cgi I use is disk IO bound, not CPU bound. gtoomey |
Re: Whitespace removal in html generated by cgi
It was a dark and stormy night, and Eric J. Roode managed to scribble:
> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Jeff 'japhy' Pinyan <pinyaj@rpi.edu> wrote in > news:Pine.SGI.3.96.1031116171158.181912A-100000@vcmr-64.server.rpi.edu: > >> Not to mention that, although most HTML renders multiple whitespace as a >> SINGLE space, a SINGLE newline IS needed, because the browser will render >> it as a space. That is, "foo\nbar" is rendered as "foo bar", while a >> string like "foo \n bar" is also just rendered as "foo bar". > > Ooh, good point. > I tried it on a dozen cgis and it worked. To make this foolproof your need to write a HTML parser - this is left as an exercise for the reader! gtoomey |
| All times are GMT. The time now is 08:19 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.