Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > compress a short string to an even shorter string

Reply
Thread Tools

compress a short string to an even shorter string

 
 
Austin
Guest
Posts: n/a
 
      11-28-2003
I am looking for a way to compress a short string to a shorter string
then be able to un-compress it at the other end.

To be more specific I have a site with an extremely long set of
parameters in the URL, I want to compress these and use one very short
parameter which I can then un-compress and read them all back. I have
already taken other steps to reduce the length.

To through a bit more information in the pot, the string is made up of
[a-z0-9] and the compressed string can use any of the UTF-8 chars.

Our server side code is J2EE Java and I will be using this for the
compression/un-compression.

I have tried Gzip, and briefly looked at crypto triple des ecryption
not because i want to make this secure but because for some reason i
thought it might give me a short encrypted string.
 
Reply With Quote
 
 
 
 
Thomas Schodt
Guest
Posts: n/a
 
      11-28-2003
Are we to assume the String is transmitted in UTF-8
(16 bits per character)?


Two approaches come to mind (there may be more).

Base conversion.
A base-36 String [0-9a-z] takes 16 bits to encode each digit.
You can convert it to base-16 using BigInteger
String asHex = new BigInteger(String,36).toString(16);
and from that create a byte array encoding 2 hex digits in each byte.
The byte array will take up about a third of the original String.

Lempel-Ziv compression.
If your data contains repetitions you'll want to have a
look at the LZ compression algorithm.
Google "Lempel-Ziv Java".
 
Reply With Quote
 
 
 
 
Thomas Weidenfeller
Guest
Posts: n/a
 
      11-28-2003
http://www.velocityreviews.com/forums/(E-Mail Removed) (Austin) writes:
> To be more specific I have a site with an extremely long set of
> parameters in the URL, I want to compress these and use one very short
> parameter which I can then un-compress and read them all back. I have
> already taken other steps to reduce the length.


The usual way to handle this is to change from a HTTP GET request with
a long URL in the header to an HTTP POST request with the data in the
body of the request.

/Thomas
 
Reply With Quote
 
jpshahom
Guest
Posts: n/a
 
      11-28-2003
(E-Mail Removed) (Austin) wrote in message news:<(E-Mail Removed). com>...

How about using a Hashtable on the server side if enough of your data is repetitive?

> I am looking for a way to compress a short string to a shorter string
> then be able to un-compress it at the other end.
>
> To be more specific I have a site with an extremely long set of
> parameters in the URL, I want to compress these and use one very short
> parameter which I can then un-compress and read them all back. I have
> already taken other steps to reduce the length.
>
> To through a bit more information in the pot, the string is made up of
> [a-z0-9] and the compressed string can use any of the UTF-8 chars.
>
> Our server side code is J2EE Java and I will be using this for the
> compression/un-compression.
>
> I have tried Gzip, and briefly looked at crypto triple des ecryption
> not because i want to make this secure but because for some reason i
> thought it might give me a short encrypted string.

 
Reply With Quote
 
nos
Guest
Posts: n/a
 
      11-29-2003
i think base 36 needs only 5.17 bits for each digit
not 16 bits

"Thomas Schodt" <news0310@xenoc.$DEMON.co.uk> wrote in message
news:Xns944195BE8137Exenoc@158.152.254.254...
> Are we to assume the String is transmitted in UTF-8
> (16 bits per character)?
>
>
> Two approaches come to mind (there may be more).
>
> Base conversion.
> A base-36 String [0-9a-z] takes 16 bits to encode each digit.
> You can convert it to base-16 using BigInteger
> String asHex = new BigInteger(String,36).toString(16);
> and from that create a byte array encoding 2 hex digits in each byte.
> The byte array will take up about a third of the original String.
>
> Lempel-Ziv compression.
> If your data contains repetitions you'll want to have a
> look at the LZ compression algorithm.
> Google "Lempel-Ziv Java".



 
Reply With Quote
 
Thomas Schodt
Guest
Posts: n/a
 
      11-29-2003
"nos" <(E-Mail Removed)> wrote in news:T4Wxb.249368$9E1.1349089@attbi_s52:

> "Thomas Schodt" <news0310@xenoc.$DEMON.co.uk> wrote in message
> news:Xns944195BE8137Exenoc@158.152.254.254...

....
>> A base-36 String [0-9a-z] takes 16 bits to encode each digit.


> i think base 36 needs only 5.17 bits for each digit
> not 16 bits


Yes, but
encoded in a [java.lang.]String it will take 16 bits per digit.
 
Reply With Quote
 
Tim Tyler
Guest
Posts: n/a
 
      12-02-2003
nos <(E-Mail Removed)> wrote or quoted:

> i think base 36 needs only 5.17 bits for each digit
> not 16 bits


....not that URLs are in anything like base 36 - they can be case
sensitive for one thing...
--
__________
|im |yler http://timtyler.org/ (E-Mail Removed) Remove lock to reply.
 
Reply With Quote
 
Tim Tyler
Guest
Posts: n/a
 
      12-02-2003
Tim Tyler <(E-Mail Removed)> wrote or quoted:
> nos <(E-Mail Removed)> wrote or quoted:


>> i think base 36 needs only 5.17 bits for each digit
>> not 16 bits

>
> ...not that URLs are in anything like base 36 - they can be case
> sensitive for one thing...


I see the OP /did/ say his strings were of this form - though it seems
rather suprising that there are no "=" or "&" characters involved.
--
__________
|im |yler http://timtyler.org/ (E-Mail Removed) Remove lock to reply.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Difference of extern short *x and extern short x[]? Andre C Programming 5 07-17-2012 07:38 PM
unsigned short, short literals Ioannis Vranos C Programming 5 03-05-2008 01:25 AM
longs, long longs, short short long ints . . . huh?! David Geering C Programming 15 01-11-2007 09:39 PM
unsigned short short? slougheed@gmail.com C++ 4 10-16-2006 11:25 PM
NTFS compress VS folder compress in x64 jienmin Windows 64bit 1 09-02-2005 06:01 AM



Advertisments