![]() |
html compression tools (command line)
Hi,
Does anyone know of command line tools for html compression? The only one I am aware of is htmlcrunch (http://www.markusstengel.de/htmlcr.html) but, frankly, this does not perform very well (often makes the input file bigger!). This is for my website compression tool 'webpack' (http://www.kludgesoft.com/nix/webpack.html - blatant plug :), for which I am trying to avoid writing a better html compressor myself. Rather not re-invent the wheel, you know. Also if there is any information around on making html more compressible, I would appreciate pointers to information/tools (the only method I've heard of is making all html tags lower case, but there may be other methods). Any assistance appreciated! Errol Smith errol <at> ros (dot) com [period] au |
Re: html compression tools (command line)
"Errol Smith" <email@see.signature.com> wrote in message news:ivpnk01vih84k1jvr77kc82qomlog9ttt9@4ax.com... > Hi, > > Does anyone know of command line tools for html compression? This has been discussed here before. The general consensus is that it is a waste of time. Look to other things first: Image compression. A badly compressed image will waste far more bandwidth than compressing the HTML will save; Number of images: 10 images on a page results in 10 round trips back to the server, a elapsed time of hundreds of milliseconds, perhaps even a number of seconds. Compressing the HTML might save ten or so milliseconds. > This is for my website compression tool 'webpack' > (http://www.kludgesoft.com/nix/webpack.html - blatant plug :), I note that you don't compress *this* page :-) You even have great sequences of cr/lf in there. -- Cheers Richard. |
Re: html compression tools (command line)
Errol Smith wrote:
> Hi, > > Does anyone know of command line tools for html compression? > The only one I am aware of is htmlcrunch > (http://www.markusstengel.de/htmlcr.html) but, frankly, this does not > perform very well (often makes the input file bigger!). > This is for my website compression tool 'webpack' > (http://www.kludgesoft.com/nix/webpack.html - blatant plug :), for > which I am trying to avoid writing a better html compressor myself. > Rather not re-invent the wheel, you know. > Also if there is any information around on making html more > compressible, I would appreciate pointers to information/tools (the > only method I've heard of is making all html tags lower case, but > there may be other methods). > Any assistance appreciated! > > Errol Smith > errol <at> ros (dot) com [period] au gzip is command line! With mod_gzip or mod_gunzip on an Apache server all your pages are sent gzipped, completely transparently to most browsers (even IE!) but expanded for the very few that can't handle content-encoding gzip. Will reduce page size by about 60%, but should be in adition to, rather than instead of, compact markup. Regardless of compression, I try to keep pages below 10k. Like others have said, it is easy to have images larger than this size. Well compressed 8bit PNGs and jpegs should help here. http://www.innerjoin.org/apache-compression/howto.html |
Re: html compression tools (command line)
Jim Higson wrote:
> Errol Smith wrote: > >> Hi, >> >> Does anyone know of command line tools for html compression? >> The only one I am aware of is htmlcrunch >> (http://www.markusstengel.de/htmlcr.html) but, frankly, this does not >> perform very well (often makes the input file bigger!). >> This is for my website compression tool 'webpack' >> (http://www.kludgesoft.com/nix/webpack.html - blatant plug :), for >> which I am trying to avoid writing a better html compressor myself. >> Rather not re-invent the wheel, you know. >> Also if there is any information around on making html more >> compressible, I would appreciate pointers to information/tools (the >> only method I've heard of is making all html tags lower case, but >> there may be other methods). >> Any assistance appreciated! >> >> Errol Smith >> errol <at> ros (dot) com [period] au > > gzip is command line! > With mod_gzip or mod_gunzip on an Apache server all your pages are sent > gzipped, completely transparently to most browsers (even IE!) but expanded > for the very few that can't handle content-encoding gzip. > > Will reduce page size by about 60%, but should be in adition to, rather > than instead of, compact markup. > > Regardless of compression, I try to keep pages below 10k. Like others have > said, it is easy to have images larger than this size. Well compressed > 8bit PNGs and jpegs should help here. > > http://www.innerjoin.org/apache-compression/howto.html Incidently, I'd forget about making pages more compressable prior to gzipping, information theory is not on your side. I'd also forget about writing a better compression tool than gzip, unless you are SERIOUSLY into maths. 7zip might give slightly better results in some cases, but AFAIK no browsers accept is as a Content-Encoding. |
Re: html compression tools (command line)
Jim Higson wrote:
> Jim Higson wrote: > >> Errol Smith wrote: >> >>> Hi, >>> >>> Does anyone know of command line tools for html compression? >>> The only one I am aware of is htmlcrunch >>> (http://www.markusstengel.de/htmlcr.html) but, frankly, this does not >>> perform very well (often makes the input file bigger!). >>> This is for my website compression tool 'webpack' >>> (http://www.kludgesoft.com/nix/webpack.html - blatant plug :), for >>> which I am trying to avoid writing a better html compressor myself. >>> Rather not re-invent the wheel, you know. >>> Also if there is any information around on making html more >>> compressible, I would appreciate pointers to information/tools (the >>> only method I've heard of is making all html tags lower case, but >>> there may be other methods). >>> Any assistance appreciated! >>> >>> Errol Smith >>> errol <at> ros (dot) com [period] au >> >> gzip is command line! >> With mod_gzip or mod_gunzip on an Apache server all your pages are sent >> gzipped, completely transparently to most browsers (even IE!) but >> expanded for the very few that can't handle content-encoding gzip. >> >> Will reduce page size by about 60%, but should be in adition to, rather >> than instead of, compact markup. >> >> Regardless of compression, I try to keep pages below 10k. Like others >> have said, it is easy to have images larger than this size. Well >> compressed 8bit PNGs and jpegs should help here. >> >> http://www.innerjoin.org/apache-compression/howto.html > > > Incidently, I'd forget about making pages more compressable prior to > gzipping, information theory is not on your side. > > I'd also forget about writing a better compression tool than gzip, unless > you are SERIOUSLY into maths. 7zip might give slightly better results in > some cases, but AFAIK no browsers accept is as a Content-Encoding. Man, gotta stop replying to myself, but... check out Perl's HTLM::Clean it already does much of what you are (maybe) trying to do. Anyone on an Apache server can use it as a filter for dynamic content, or apply it offline for static pages. http://www.perl.com/pub/a/2003/04/17/filters.html |
Re: html compression tools (command line)
On Sat, 18 Sep 2004 08:17:12 GMT, "rf" <rf@.invalid> wrote:
>> Does anyone know of command line tools for html compression? > >This has been discussed here before. The general consensus is that it is a >waste of time. Look to other things first: Image compression. A badly >compressed image will waste far more bandwidth than compressing the HTML >will save; Number of images: 10 images on a page results in 10 round trips >back to the server, a elapsed time of hundreds of milliseconds, perhaps even >a number of seconds. Compressing the HTML might save ten or so milliseconds. I know, I read previous posts, but I will not be discouraged, as a I believe in the every-byte-counts theory :-) My tool is intended to cover all bases anyway - it already optimises JPG, GIF & PNG images. My tools aim is the last-step prior to publishing, just to automatically shave off a few K here & there. It won't help you if you save your JPG's with 100% quality and use WORD as your html editor ;) >> This is for my website compression tool 'webpack' >> (http://www.kludgesoft.com/nix/webpack.html - blatant plug :), > >I note that you don't compress *this* page :-) You even have great sequences >of cr/lf in there. Actually I _do_, but like I said, htmlcrunch is not very good :-) Errol Smith errol <at> ros (dot) com [period] au |
Re: html compression tools (command line)
On Sat, 18 Sep 2004 13:01:45 +0100, Jim Higson wrote:
>>>> Does anyone know of command line tools for html compression? .... >>>> Also if there is any information around on making html more >>>> compressible, I would appreciate pointers to information/tools (the >>>> only method I've heard of is making all html tags lower case, but >>>> there may be other methods). >>> >>> gzip is command line! >>> With mod_gzip or mod_gunzip on an Apache server all your pages are sent >>> gzipped, completely transparently to most browsers (even IE!) but >>> expanded for the very few that can't handle content-encoding gzip. >>> >>> Will reduce page size by about 60%, but should be in adition to, rather >>> than instead of, compact markup. >>> >>> Regardless of compression, I try to keep pages below 10k. Like others >>> have said, it is easy to have images larger than this size. Well >>> compressed 8bit PNGs and jpegs should help here. >>> >>> http://www.innerjoin.org/apache-compression/howto.html >> >> >> Incidently, I'd forget about making pages more compressable prior to >> gzipping, information theory is not on your side. >> >> I'd also forget about writing a better compression tool than gzip, unless >> you are SERIOUSLY into maths. 7zip might give slightly better results in >> some cases, but AFAIK no browsers accept is as a Content-Encoding. > >Man, gotta stop replying to myself, but... > >check out Perl's HTLM::Clean it already does much of what you are (maybe) >trying to do. Anyone on an Apache server can use it as a filter for dynamic >content, or apply it offline for static pages. > >http://www.perl.com/pub/a/2003/04/17/filters.html Jim, Thankyou very much for your (mutliple) replies! I am aware of the gzip functionality in webservers/browsers, I am more interested in html cleaning/optimising (ie. "compact markup"). This means that browsers and/or servers not supporting those encoding methods still benefit, plus even with gzip encoding, the resultant compressed file will still be smaller than if the html had not been compacted first. As for the "making more compressible" I know this is a niche topic and there is probably not much to be gained but it interests me anyway :-) (I do have some knowledge/experience of compression). I can see how making the case of all tags consistent would improve compression (more dictionary matches), but there may be more to it. Oh, and I am definately NOT looking to write a new compressor like gzip etc, only an HTML compacter :-) (7zip is very good but not fully cross platform yet. If there is going to be any new kind of encoding standard I would expect it to be .bz2, though it may not be suitable for on-the-fly compression due it's large block size). Perl's HTML::Clean looks like what I need - I will have to experiment with it (but first remember how to use Perl! :-) Thanks again, I will keep hunting. Errol Smith errol <at> ros (dot) com [period] au |
Re: html compression tools (command line)
Errol Smith wrote:
> On Sat, 18 Sep 2004 13:01:45 +0100, Jim Higson wrote: >>>>> Does anyone know of command line tools for html compression? > ... >>>>> Also if there is any information around on making html more >>>>> compressible, I would appreciate pointers to information/tools (the >>>>> only method I've heard of is making all html tags lower case, but >>>>> there may be other methods). >>>> >>>> gzip is command line! >>>> With mod_gzip or mod_gunzip on an Apache server all your pages are sent >>>> gzipped, completely transparently to most browsers (even IE!) but >>>> expanded for the very few that can't handle content-encoding gzip. >>>> >>>> Will reduce page size by about 60%, but should be in adition to, rather >>>> than instead of, compact markup. >>>> >>>> Regardless of compression, I try to keep pages below 10k. Like others >>>> have said, it is easy to have images larger than this size. Well >>>> compressed 8bit PNGs and jpegs should help here. >>>> >>>> http://www.innerjoin.org/apache-compression/howto.html >>> >>> >>> Incidently, I'd forget about making pages more compressable prior to >>> gzipping, information theory is not on your side. >>> >>> I'd also forget about writing a better compression tool than gzip, >>> unless you are SERIOUSLY into maths. 7zip might give slightly better >>> results in some cases, but AFAIK no browsers accept is as a >>> Content-Encoding. >> >>Man, gotta stop replying to myself, but... >> >>check out Perl's HTLM::Clean it already does much of what you are (maybe) >>trying to do. Anyone on an Apache server can use it as a filter for >>dynamic content, or apply it offline for static pages. >> >>http://www.perl.com/pub/a/2003/04/17/filters.html > > Jim, > > Thankyou very much for your (mutliple) replies! > I am aware of the gzip functionality in webservers/browsers, I am > more interested in html cleaning/optimising (ie. "compact markup"). > This means that browsers and/or servers not supporting those encoding > methods still benefit, plus even with gzip encoding, the resultant > compressed file will still be smaller than if the html had not been > compacted first. > As for the "making more compressible" I know this is a niche topic > and there is probably not much to be gained but it interests me anyway > :-) (I do have some knowledge/experience of compression). I can see > how making the case of all tags consistent would improve compression > (more dictionary matches), but there may be more to it. > Oh, and I am definately NOT looking to write a new compressor like > gzip etc, only an HTML compacter :-) (7zip is very good but not fully > cross platform yet. If there is going to be any new kind of encoding > standard I would expect it to be .bz2, though it may not be suitable > for on-the-fly compression due it's large block size). > Perl's HTML::Clean looks like what I need - I will have to experiment > with it (but first remember how to use Perl! :-) > Thanks again, I will keep hunting. > > Errol Smith > errol <at> ros (dot) com [period] au I have a better idea of what you're trying to do now. I quite like the idea. I don't think very well made pages could be shrunk much, but for some guy's homepage you might be onto something. Some ideas: * Replacing class and id names with single letter identifiers in the html and css? Might not save much if the file is gzipped since they're repeated strings anyway, but might be worth a few bytes. Will also make the code harder to read, so personally I would avoid. * Replacing long URLs to pages on the same site with hrefs to symlinks on the server, with much smaller names? Static pages only I'm affraid. * Lossy PNG compression (google for it!) and conversion of PNGs to indexed * stripping comments, lf and cr. I don't like this much becasue I think you should be able to look at the html of a site, but would save a little space. * A thumbnail maker that makes the thumbs from lossless versions of the artwork, not the published jpeg version so images aren't compressed twice. I do this on sites I create, my thumbs are a *little* smaller and a *tiny* bit higher quality because of it * automatic replacing of img tags with objects, where it is more compact, or with divs and css where the image isn't content. Not sure how you could tell, mind. * Decision-tree induction to convert font tags to css. A lot of bad code could be made smaller this way * Moving embeded css into seperate file, where the same rules are used on several pages. * Check out advpng, shrinks PNG images down by few percent or so. Ok! |
Re: html compression tools (command line)
Jim Higson <jh@333.org> wrote in
news:dM6dnemdMck1CNDcRVn-vw@eclipse.net.uk: > * Check out advpng, shrinks PNG images down by few percent or so. How does it compare to PNGOUT? |
Re: html compression tools (command line)
Sam Hughes wrote:
> Jim Higson <jh@333.org> wrote in > news:dM6dnemdMck1CNDcRVn-vw@eclipse.net.uk: > > >> * Check out advpng, shrinks PNG images down by few percent or so. > > How does it compare to PNGOUT? I just did a little test with the 9 small images you see at the top of my client's page here: http://www.masmodels.com/portfolio advpng : 57.1k pngout : 55.7k (97.5 of advpng) so pngout is *slightly* better However, I sometimes run my scripts an a Linux/PPC computer so programs distributed as i386 binary only are not much use to me. About pngout - I don't think I'll use it, personally I don't much like software I can't modify, and like even less being directed to a 38k HTML file (plus images), where I am asked to wait in line for a 28k download! -- Jim |
| All times are GMT. The time now is 12:13 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.