Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   HTML (http://www.velocityreviews.com/forums/f31-html.html)
-   -   html compression tools (command line) (http://www.velocityreviews.com/forums/t159442-html-compression-tools-command-line.html)

Errol Smith 09-18-2004 07:58 AM

html compression tools (command line)
 
Hi,

Does anyone know of command line tools for html compression?
The only one I am aware of is htmlcrunch
(http://www.markusstengel.de/htmlcr.html) but, frankly, this does not
perform very well (often makes the input file bigger!).
This is for my website compression tool 'webpack'
(http://www.kludgesoft.com/nix/webpack.html - blatant plug :), for
which I am trying to avoid writing a better html compressor myself.
Rather not re-invent the wheel, you know.
Also if there is any information around on making html more
compressible, I would appreciate pointers to information/tools (the
only method I've heard of is making all html tags lower case, but
there may be other methods).
Any assistance appreciated!

Errol Smith
errol <at> ros (dot) com [period] au

rf 09-18-2004 08:17 AM

Re: html compression tools (command line)
 

"Errol Smith" <email@see.signature.com> wrote in message
news:ivpnk01vih84k1jvr77kc82qomlog9ttt9@4ax.com...
> Hi,
>
> Does anyone know of command line tools for html compression?


This has been discussed here before. The general consensus is that it is a
waste of time. Look to other things first: Image compression. A badly
compressed image will waste far more bandwidth than compressing the HTML
will save; Number of images: 10 images on a page results in 10 round trips
back to the server, a elapsed time of hundreds of milliseconds, perhaps even
a number of seconds. Compressing the HTML might save ten or so milliseconds.

> This is for my website compression tool 'webpack'
> (http://www.kludgesoft.com/nix/webpack.html - blatant plug :),


I note that you don't compress *this* page :-) You even have great sequences
of cr/lf in there.

--
Cheers
Richard.



Jim Higson 09-18-2004 11:43 AM

Re: html compression tools (command line)
 
Errol Smith wrote:

> Hi,
>
> Does anyone know of command line tools for html compression?
> The only one I am aware of is htmlcrunch
> (http://www.markusstengel.de/htmlcr.html) but, frankly, this does not
> perform very well (often makes the input file bigger!).
> This is for my website compression tool 'webpack'
> (http://www.kludgesoft.com/nix/webpack.html - blatant plug :), for
> which I am trying to avoid writing a better html compressor myself.
> Rather not re-invent the wheel, you know.
> Also if there is any information around on making html more
> compressible, I would appreciate pointers to information/tools (the
> only method I've heard of is making all html tags lower case, but
> there may be other methods).
> Any assistance appreciated!
>
> Errol Smith
> errol <at> ros (dot) com [period] au


gzip is command line!
With mod_gzip or mod_gunzip on an Apache server all your pages are sent
gzipped, completely transparently to most browsers (even IE!) but expanded
for the very few that can't handle content-encoding gzip.

Will reduce page size by about 60%, but should be in adition to, rather than
instead of, compact markup.

Regardless of compression, I try to keep pages below 10k. Like others have
said, it is easy to have images larger than this size. Well compressed 8bit
PNGs and jpegs should help here.

http://www.innerjoin.org/apache-compression/howto.html

Jim Higson 09-18-2004 11:52 AM

Re: html compression tools (command line)
 
Jim Higson wrote:

> Errol Smith wrote:
>
>> Hi,
>>
>> Does anyone know of command line tools for html compression?
>> The only one I am aware of is htmlcrunch
>> (http://www.markusstengel.de/htmlcr.html) but, frankly, this does not
>> perform very well (often makes the input file bigger!).
>> This is for my website compression tool 'webpack'
>> (http://www.kludgesoft.com/nix/webpack.html - blatant plug :), for
>> which I am trying to avoid writing a better html compressor myself.
>> Rather not re-invent the wheel, you know.
>> Also if there is any information around on making html more
>> compressible, I would appreciate pointers to information/tools (the
>> only method I've heard of is making all html tags lower case, but
>> there may be other methods).
>> Any assistance appreciated!
>>
>> Errol Smith
>> errol <at> ros (dot) com [period] au

>
> gzip is command line!
> With mod_gzip or mod_gunzip on an Apache server all your pages are sent
> gzipped, completely transparently to most browsers (even IE!) but expanded
> for the very few that can't handle content-encoding gzip.
>
> Will reduce page size by about 60%, but should be in adition to, rather
> than instead of, compact markup.
>
> Regardless of compression, I try to keep pages below 10k. Like others have
> said, it is easy to have images larger than this size. Well compressed
> 8bit PNGs and jpegs should help here.
>
> http://www.innerjoin.org/apache-compression/howto.html



Incidently, I'd forget about making pages more compressable prior to
gzipping, information theory is not on your side.

I'd also forget about writing a better compression tool than gzip, unless
you are SERIOUSLY into maths. 7zip might give slightly better results in
some cases, but AFAIK no browsers accept is as a Content-Encoding.

Jim Higson 09-18-2004 12:01 PM

Re: html compression tools (command line)
 
Jim Higson wrote:

> Jim Higson wrote:
>
>> Errol Smith wrote:
>>
>>> Hi,
>>>
>>> Does anyone know of command line tools for html compression?
>>> The only one I am aware of is htmlcrunch
>>> (http://www.markusstengel.de/htmlcr.html) but, frankly, this does not
>>> perform very well (often makes the input file bigger!).
>>> This is for my website compression tool 'webpack'
>>> (http://www.kludgesoft.com/nix/webpack.html - blatant plug :), for
>>> which I am trying to avoid writing a better html compressor myself.
>>> Rather not re-invent the wheel, you know.
>>> Also if there is any information around on making html more
>>> compressible, I would appreciate pointers to information/tools (the
>>> only method I've heard of is making all html tags lower case, but
>>> there may be other methods).
>>> Any assistance appreciated!
>>>
>>> Errol Smith
>>> errol <at> ros (dot) com [period] au

>>
>> gzip is command line!
>> With mod_gzip or mod_gunzip on an Apache server all your pages are sent
>> gzipped, completely transparently to most browsers (even IE!) but
>> expanded for the very few that can't handle content-encoding gzip.
>>
>> Will reduce page size by about 60%, but should be in adition to, rather
>> than instead of, compact markup.
>>
>> Regardless of compression, I try to keep pages below 10k. Like others
>> have said, it is easy to have images larger than this size. Well
>> compressed 8bit PNGs and jpegs should help here.
>>
>> http://www.innerjoin.org/apache-compression/howto.html

>
>
> Incidently, I'd forget about making pages more compressable prior to
> gzipping, information theory is not on your side.
>
> I'd also forget about writing a better compression tool than gzip, unless
> you are SERIOUSLY into maths. 7zip might give slightly better results in
> some cases, but AFAIK no browsers accept is as a Content-Encoding.


Man, gotta stop replying to myself, but...

check out Perl's HTLM::Clean it already does much of what you are (maybe)
trying to do. Anyone on an Apache server can use it as a filter for dynamic
content, or apply it offline for static pages.

http://www.perl.com/pub/a/2003/04/17/filters.html

Errol Smith 09-19-2004 01:32 AM

Re: html compression tools (command line)
 
On Sat, 18 Sep 2004 08:17:12 GMT, "rf" <rf@.invalid> wrote:

>> Does anyone know of command line tools for html compression?

>
>This has been discussed here before. The general consensus is that it is a
>waste of time. Look to other things first: Image compression. A badly
>compressed image will waste far more bandwidth than compressing the HTML
>will save; Number of images: 10 images on a page results in 10 round trips
>back to the server, a elapsed time of hundreds of milliseconds, perhaps even
>a number of seconds. Compressing the HTML might save ten or so milliseconds.


I know, I read previous posts, but I will not be discouraged, as a I
believe in the every-byte-counts theory :-)
My tool is intended to cover all bases anyway - it already optimises
JPG, GIF & PNG images. My tools aim is the last-step prior to
publishing, just to automatically shave off a few K here & there. It
won't help you if you save your JPG's with 100% quality and use WORD
as your html editor ;)

>> This is for my website compression tool 'webpack'
>> (http://www.kludgesoft.com/nix/webpack.html - blatant plug :),

>
>I note that you don't compress *this* page :-) You even have great sequences
>of cr/lf in there.


Actually I _do_, but like I said, htmlcrunch is not very good :-)

Errol Smith
errol <at> ros (dot) com [period] au

Errol Smith 09-19-2004 02:02 AM

Re: html compression tools (command line)
 
On Sat, 18 Sep 2004 13:01:45 +0100, Jim Higson wrote:
>>>> Does anyone know of command line tools for html compression?

....
>>>> Also if there is any information around on making html more
>>>> compressible, I would appreciate pointers to information/tools (the
>>>> only method I've heard of is making all html tags lower case, but
>>>> there may be other methods).
>>>
>>> gzip is command line!
>>> With mod_gzip or mod_gunzip on an Apache server all your pages are sent
>>> gzipped, completely transparently to most browsers (even IE!) but
>>> expanded for the very few that can't handle content-encoding gzip.
>>>
>>> Will reduce page size by about 60%, but should be in adition to, rather
>>> than instead of, compact markup.
>>>
>>> Regardless of compression, I try to keep pages below 10k. Like others
>>> have said, it is easy to have images larger than this size. Well
>>> compressed 8bit PNGs and jpegs should help here.
>>>
>>> http://www.innerjoin.org/apache-compression/howto.html

>>
>>
>> Incidently, I'd forget about making pages more compressable prior to
>> gzipping, information theory is not on your side.
>>
>> I'd also forget about writing a better compression tool than gzip, unless
>> you are SERIOUSLY into maths. 7zip might give slightly better results in
>> some cases, but AFAIK no browsers accept is as a Content-Encoding.

>
>Man, gotta stop replying to myself, but...
>
>check out Perl's HTLM::Clean it already does much of what you are (maybe)
>trying to do. Anyone on an Apache server can use it as a filter for dynamic
>content, or apply it offline for static pages.
>
>http://www.perl.com/pub/a/2003/04/17/filters.html


Jim,

Thankyou very much for your (mutliple) replies!
I am aware of the gzip functionality in webservers/browsers, I am
more interested in html cleaning/optimising (ie. "compact markup").
This means that browsers and/or servers not supporting those encoding
methods still benefit, plus even with gzip encoding, the resultant
compressed file will still be smaller than if the html had not been
compacted first.
As for the "making more compressible" I know this is a niche topic
and there is probably not much to be gained but it interests me anyway
:-) (I do have some knowledge/experience of compression). I can see
how making the case of all tags consistent would improve compression
(more dictionary matches), but there may be more to it.
Oh, and I am definately NOT looking to write a new compressor like
gzip etc, only an HTML compacter :-) (7zip is very good but not fully
cross platform yet. If there is going to be any new kind of encoding
standard I would expect it to be .bz2, though it may not be suitable
for on-the-fly compression due it's large block size).
Perl's HTML::Clean looks like what I need - I will have to experiment
with it (but first remember how to use Perl! :-)
Thanks again, I will keep hunting.

Errol Smith
errol <at> ros (dot) com [period] au

Jim Higson 09-19-2004 02:20 PM

Re: html compression tools (command line)
 
Errol Smith wrote:

> On Sat, 18 Sep 2004 13:01:45 +0100, Jim Higson wrote:
>>>>> Does anyone know of command line tools for html compression?

> ...
>>>>> Also if there is any information around on making html more
>>>>> compressible, I would appreciate pointers to information/tools (the
>>>>> only method I've heard of is making all html tags lower case, but
>>>>> there may be other methods).
>>>>
>>>> gzip is command line!
>>>> With mod_gzip or mod_gunzip on an Apache server all your pages are sent
>>>> gzipped, completely transparently to most browsers (even IE!) but
>>>> expanded for the very few that can't handle content-encoding gzip.
>>>>
>>>> Will reduce page size by about 60%, but should be in adition to, rather
>>>> than instead of, compact markup.
>>>>
>>>> Regardless of compression, I try to keep pages below 10k. Like others
>>>> have said, it is easy to have images larger than this size. Well
>>>> compressed 8bit PNGs and jpegs should help here.
>>>>
>>>> http://www.innerjoin.org/apache-compression/howto.html
>>>
>>>
>>> Incidently, I'd forget about making pages more compressable prior to
>>> gzipping, information theory is not on your side.
>>>
>>> I'd also forget about writing a better compression tool than gzip,
>>> unless you are SERIOUSLY into maths. 7zip might give slightly better
>>> results in some cases, but AFAIK no browsers accept is as a
>>> Content-Encoding.

>>
>>Man, gotta stop replying to myself, but...
>>
>>check out Perl's HTLM::Clean it already does much of what you are (maybe)
>>trying to do. Anyone on an Apache server can use it as a filter for
>>dynamic content, or apply it offline for static pages.
>>
>>http://www.perl.com/pub/a/2003/04/17/filters.html

>
> Jim,
>
> Thankyou very much for your (mutliple) replies!
> I am aware of the gzip functionality in webservers/browsers, I am
> more interested in html cleaning/optimising (ie. "compact markup").
> This means that browsers and/or servers not supporting those encoding
> methods still benefit, plus even with gzip encoding, the resultant
> compressed file will still be smaller than if the html had not been
> compacted first.
> As for the "making more compressible" I know this is a niche topic
> and there is probably not much to be gained but it interests me anyway
> :-) (I do have some knowledge/experience of compression). I can see
> how making the case of all tags consistent would improve compression
> (more dictionary matches), but there may be more to it.
> Oh, and I am definately NOT looking to write a new compressor like
> gzip etc, only an HTML compacter :-) (7zip is very good but not fully
> cross platform yet. If there is going to be any new kind of encoding
> standard I would expect it to be .bz2, though it may not be suitable
> for on-the-fly compression due it's large block size).
> Perl's HTML::Clean looks like what I need - I will have to experiment
> with it (but first remember how to use Perl! :-)
> Thanks again, I will keep hunting.
>
> Errol Smith
> errol <at> ros (dot) com [period] au


I have a better idea of what you're trying to do now. I quite like the idea.
I don't think very well made pages could be shrunk much, but for some guy's
homepage you might be onto something. Some ideas:

* Replacing class and id names with single letter identifiers in the html
and css? Might not save much if the file is gzipped since they're repeated
strings anyway, but might be worth a few bytes. Will also make the code
harder to read, so personally I would avoid.
* Replacing long URLs to pages on the same site with hrefs to symlinks on
the server, with much smaller names? Static pages only I'm affraid.
* Lossy PNG compression (google for it!) and conversion of PNGs to indexed
* stripping comments, lf and cr. I don't like this much becasue I think you
should be able to look at the html of a site, but would save a little
space.
* A thumbnail maker that makes the thumbs from lossless versions of the
artwork, not the published jpeg version so images aren't compressed twice.
I do this on sites I create, my thumbs are a *little* smaller and a *tiny*
bit higher quality because of it
* automatic replacing of img tags with objects, where it is more compact, or
with divs and css where the image isn't content. Not sure how you could
tell, mind.
* Decision-tree induction to convert font tags to css. A lot of bad code
could be made smaller this way
* Moving embeded css into seperate file, where the same rules are used on
several pages.
* Check out advpng, shrinks PNG images down by few percent or so.

Ok!

Sam Hughes 09-19-2004 07:23 PM

Re: html compression tools (command line)
 
Jim Higson <jh@333.org> wrote in
news:dM6dnemdMck1CNDcRVn-vw@eclipse.net.uk:


> * Check out advpng, shrinks PNG images down by few percent or so.


How does it compare to PNGOUT?

Jim Higson 09-19-2004 08:06 PM

Re: html compression tools (command line)
 
Sam Hughes wrote:

> Jim Higson <jh@333.org> wrote in
> news:dM6dnemdMck1CNDcRVn-vw@eclipse.net.uk:
>
>
>> * Check out advpng, shrinks PNG images down by few percent or so.

>
> How does it compare to PNGOUT?


I just did a little test with the 9 small images you see at the top of my
client's page here:

http://www.masmodels.com/portfolio

advpng : 57.1k
pngout : 55.7k (97.5 of advpng)

so pngout is *slightly* better

However, I sometimes run my scripts an a Linux/PPC computer so programs
distributed as i386 binary only are not much use to me.

About pngout - I don't think I'll use it, personally I don't much like
software I can't modify, and like even less being directed to a 38k HTML
file (plus images), where I am asked to wait in line for a 28k download!
--
Jim


All times are GMT. The time now is 12:13 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57