Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > zlib.h

Reply
Thread Tools

zlib.h

 
 
Andrea Crotti
Guest
Posts: n/a
 
      07-27-2010
Hi everyone, I would like to compress some data, in practice some ip
packets that have to be compressed and then sent chunked and sent over
the network.

I've seen zlib.h and it looks nice, but I have some trouble
understanding how ti works.

I tried to understand the zpipe example and modified for me and I got
something like
(where the struct is)
--8<---------------cut here---------------start------------->8---
typdef struct {
unsigned char *stream;
int len;
} payload_t;
--8<---------------cut here---------------end--------------->8---

--8<---------------cut here---------------start------------->8---
// compress original data in the given result
int payload_compress(payload_t data, payload_t *result) {
int ret, flush;
unsigned have;
z_stream strm;

/* allocate deflate state */
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
ret = deflateInit(&strm, LEVEL);
if (ret != Z_OK)
return ret;

// is this thing enough for it?
strm.next_in = data.stream;

ret = deflate(&strm, Z_FINISH); /* no bad return value */
// the initialization was successful
assert(ret != Z_STREAM_ERROR); /* state not clobbered */

deflateEnd(&strm);
return Z_OK;
}
--8<---------------cut here---------------end--------------->8---

what I don't understand is:
- is it just setting "next_in" enough to give the data to the compression?
- how do write to my "result" object?
- why does it say (here http://www.zlib.net/zlib_how.html)
" CHUNK is simply the buffer size for feeding data to and pulling data
from the zlib routines."
does that mean that it should not be the dimension of the packet to
compress but just the max pool where it should work??

And last thing, where do I get how much is the actual data compressed?

I was used too well , in python it was just
data = zlib.compress(data)

Thanks a lot!
 
Reply With Quote
 
 
 
 
Andrea Crotti
Guest
Posts: n/a
 
      07-27-2010
In the end I luckily found the man page, and I think I understood something:

--8<---------------cut here---------------start------------->8---
int payload_compress(payload_t data, payload_t result) {
int ret;
z_stream strm;
// maybe out should be bigger
unsigned char *in = (unsigned char *) data.stream;
unsigned char *out = (unsigned char *) result.stream;

/* allocate deflate state */
// initialize the structures
strm.zalloc = Z_NULL;
strm.zfree = Z_NULL;
strm.opaque = Z_NULL;
ret = deflateInit(&strm, LEVEL);
// if initialized correctly
if (ret != Z_OK)
return ret;

strm.avail_in = data.len;
strm.avail_out = data.len;

// is this thing enough for it?
strm.next_in = in;
strm.next_out = out;
ret = deflate(&strm, Z_NO_FLUSH); /* no bad return value */
printf("now ret in and out = %d, %d, %d\n", ret, strm.avail_in, strm.avail_out);
// the initialization was successful
assert(ret != Z_STREAM_ERROR); /* state not clobbered */

return Z_OK;
}
--8<---------------cut here---------------end--------------->8---

The thing is that I wanted to compress only ONCE, the data is normally
max 1000 bytes, so I don't really think I need to chunk more.

But a very strange thing is happening, testing this function apparently
works BUT I get a SIGSEV (or sometimes a BUS error) when the program
quits (on return 0 in main), and I get no other errors before...
 
Reply With Quote
 
 
 
 
Ersek, Laszlo
Guest
Posts: n/a
 
      07-27-2010
On Tue, 27 Jul 2010, Andrea Crotti wrote:

> Hi everyone, I would like to compress some data, in practice some ip
> packets that have to be compressed and then sent chunked and sent over
> the network.
>
> I've seen zlib.h and it looks nice, but I have some trouble
> understanding how ti works.


See the manual [0].

The basic idea is this: you provide "input data" in

*next_in .. *(next_in + avail_in - 1)

(inclusive), and provide "output space" in

*next_out .. *(next_out + avail_out - 1)

(inclusive). You call the corresponding (compression or decompression)
routine. In the most common case, the routine returns due to one of these
(normal) conditions:
- output space ran out,
- input data ran out,
- (theoretically possible) both of these at the same time.

You need to "fix" all these conditions before calling the function again
(ie. provide more input if needed, *and* provide more output space if
needed).

The whole loop resembles (to me at least) how iconv() is used, and to some
extent, select(). In some sense, you don't call the (de)compression
function so that it serves you. You rather set an automaton in motion and
care for its needs. It just happens to spit out (de)compressed data. It
resembles "event driven programming", ie. the automaton is a black box and
returns "events" for you to handle ("add more input", "provide more output
space"). You need to look at the input and the output independently.

Disclaimer: I wrote the above mostly from memory, based on what I remember
from the libbzip2 (not zlib) API. Sorry. Still, "[t]he structure of
libbzip2's interfaces is similar to that of Jean-loup Gailly's and Mark
Adler's excellent zlib library" [1].

If you need an upper bound on the compressed size of the plaintext data
before actually doing a single-shot compression (you mentioned Z_FINISH),
see deflateBound() [2] in the zlib docs.

For "packet compression", you might want to consider LZO [3] [4]. For
example, OpenVPN employs "fast LZO compression" -- search [5] for
"comp-lzo".

I'll also mention QuickLZ [6] (their words: "QuickLZ is the world's
fastest compression library (really)"). I didn't test QuickLZ itself, but
tamp [7] is based on it, and tamp was actually faster than anything else
I've seen (for the compression efficiency it provided).

lacos

[0] http://zlib.net/manual.html
[1] http://bzip.org/1.0.5/bzip2-manual-1.0.5.html#top-level
[2] http://zlib.net/manual.html#Advanced
[3] http://www.oberhumer.com/opensource/lzo/#abstract
[4] http://www.oberhumer.com/opensource/lzo/#lzop
[5] http://www.openvpn.net/index.php/ope...penvpn-21.html
[6] http://www.quicklz.com/
[7] http://blogs.sun.com/timc/entry/tamp...multi_threaded
 
Reply With Quote
 
Andrea Crotti
Guest
Posts: n/a
 
      07-27-2010
"Ersek, Laszlo" <> writes:

>
> See the manual [0].
>
> The basic idea is this: you provide "input data" in
>
> *next_in .. *(next_in + avail_in - 1)
>
> (inclusive), and provide "output space" in
>
> *next_out .. *(next_out + avail_out - 1)
>
> (inclusive). You call the corresponding (compression or decompression)
> routine. In the most common case, the routine returns due to one of
> these (normal) conditions:
> - output space ran out,
> - input data ran out,
> - (theoretically possible) both of these at the same time.
>
> You need to "fix" all these conditions before calling the function
> again (ie. provide more input if needed, *and* provide more output
> space if needed).
>
> The whole loop resembles (to me at least) how iconv() is used, and to
> some extent, select(). In some sense, you don't call the
> (de)compression function so that it serves you. You rather set an
> automaton in motion and care for its needs. It just happens to spit
> out (de)compressed data. It resembles "event driven programming",
> ie. the automaton is a black box and returns "events" for you to
> handle ("add more input", "provide more output space"). You need to
> look at the input and the output independently.


Thanks a lot for the nice explanation, a couple of more questions.

What if I know I want to compress in only one shot, why do I still need
to have a loop?

And do I really need to do a deflateEnd/inflateEnd or is not compulsory?
I mean of course I don't free everything without, but I don't think is
that the reason of my segfaulting...
 
Reply With Quote
 
Tom St Denis
Guest
Posts: n/a
 
      07-27-2010
On Jul 27, 11:11*am, "Ersek, Laszlo" <la...@caesar.elte.hu> wrote:
> For "packet compression", you might want to consider LZO [3] [4]. For
> example, OpenVPN employs "fast LZO compression" -- search [5] for
> "comp-lzo".
>
> I'll also mention QuickLZ [6] (their words: "QuickLZ is the world's
> fastest compression library (really)"). I didn't test QuickLZ itself, but
> tamp [7] is based on it, and tamp was actually faster than anything else
> I've seen (for the compression efficiency it provided).


QuickLZ-C reminds me of LZRW1 from yesteryear. I've actually used
LZRW1 in a NES emulator for the GBA where I compressed saved games
instead of using their default RLE.

LZRW1 is also free for all purposes [unlike QLZ].

Tom
 
Reply With Quote
 
Andrea Crotti
Guest
Posts: n/a
 
      07-27-2010
Andrea Crotti <> writes:

>
> What if I know I want to compress in only one shot, why do I still need
> to have a loop?
>
> And do I really need to do a deflateEnd/inflateEnd or is not compulsory?
> I mean of course I don't free everything without, but I don't think is
> that the reason of my segfaulting...


Mm I guess I'm doing something wrong with the pointers then, I pass to
compress something like:

...
unsigned char *in = (unsigned char *) data.stream;
unsigned char *out = (unsigned char *) result.stream;
...
strm.next_in = in;
strm.next_out = out;


where the stream is something like
--8<---------------cut here---------------start------------->8---
stream_t *data_msg = malloc(sizeof(stream_t) * size);
stream_t *result_msg = malloc(sizeof(stream_t) * size);
stream_t *compr_msg = malloc(sizeof(stream_t) * size);
--8<---------------cut here---------------end--------------->8---

And the size of the output is equal to the input.
It looks very simple where could be the mistake?
 
Reply With Quote
 
Ersek, Laszlo
Guest
Posts: n/a
 
      07-27-2010
On Tue, 27 Jul 2010, Andrea Crotti wrote:

> What if I know I want to compress in only one shot, why do I still need
> to have a loop?


You don't. The loop should be necessary if at least one of the following
was unknown to you in advance: size of input, size of processed output.


> And do I really need to do a deflateEnd/inflateEnd or is not compulsory?
> I mean of course I don't free everything without, but I don't think is
> that the reason of my segfaulting...


----v----
If the parameter flush is set to Z_FINISH, pending input is processed,
pending output is flushed and deflate returns with Z_STREAM_END if there
was enough output space; if deflate returns with Z_OK, this function must
be called again with Z_FINISH and more output space (updated avail_out)
but no more input data, until it returns with Z_STREAM_END or an error.
After deflate has returned Z_STREAM_END, the only possible operations on
the stream are deflateReset or deflateEnd.
----^----

Beside deflateReset(), you might want to look at Z_FULL_FLUSH (instead of
Z_FINISH).

lacos
 
Reply With Quote
 
Andrea Crotti
Guest
Posts: n/a
 
      07-27-2010
"Ersek, Laszlo" <> writes:

> On Tue, 27 Jul 2010, Andrea Crotti wrote:
>
>> What if I know I want to compress in only one shot, why do I still
>> need to have a loop?

>
> You don't. The loop should be necessary if at least one of the
> following was unknown to you in advance: size of input, size of
> processed output.
>
>
>> And do I really need to do a deflateEnd/inflateEnd or is not
>> compulsory? I mean of course I don't free everything without, but I
>> don't think is that the reason of my segfaulting...

>
> ----v----
> If the parameter flush is set to Z_FINISH, pending input is processed,
> pending output is flushed and deflate returns with Z_STREAM_END if
> there was enough output space; if deflate returns with Z_OK, this
> function must be called again with Z_FINISH and more output space
> (updated avail_out) but no more input data, until it returns with
> Z_STREAM_END or an error. After deflate has returned Z_STREAM_END, the
> only possible operations on the stream are deflateReset or deflateEnd.
> ----^----
>
> Beside deflateReset(), you might want to look at Z_FULL_FLUSH (instead
> of Z_FINISH).
>
> lacos


MM looking at the doc Z_FINISH looks better to me, anyway IT WORKS!!
It's a bit brutal since I do assertions to check that one pass was
enough
assert(ret == Z_STREAM_END);
but it's perfectly fine for now.

For who could be interested here is the code
http://gist.github.com/492738

And of course any advice on style or just bugs is welcome
 
Reply With Quote
 
Ersek, Laszlo
Guest
Posts: n/a
 
      07-27-2010
On Tue, 27 Jul 2010, Andrea Crotti wrote:

> Mm I guess I'm doing something wrong with the pointers then, I pass to
> compress something like:
>
> ...
> unsigned char *in = (unsigned char *) data.stream;
> unsigned char *out = (unsigned char *) result.stream;
> ...
> strm.next_in = in;
> strm.next_out = out;
>
>
> where the stream is something like
> --8<---------------cut here---------------start------------->8---
> stream_t *data_msg = malloc(sizeof(stream_t) * size);
> stream_t *result_msg = malloc(sizeof(stream_t) * size);
> stream_t *compr_msg = malloc(sizeof(stream_t) * size);
> --8<---------------cut here---------------end--------------->8---
>
> And the size of the output is equal to the input.
> It looks very simple where could be the mistake?


Unless you can instruct any compiler that will compile the code to lay out
"stream_t" in a specific way, ie. padding, byte order and encoding, the
above seems very wrong. The usual steps should be:

- architecture-dependent C language structs
- architecture-independent, serialized wire format ("byte array")
- compressed bit-stream

Your code seems to lack the middle phase when providing input for
compression, and then it seems to store the compressed bit-stream
erroneously. The latter can't be treated as anything else than "array of
char unsigned", so the explicit pointer conversion when assigning to "out"
("next_out") is immediately suspicious.

Furthermore, you should avoid defining type names ending with "_t" if
you're on a POSIX/UNIX platform [0].

lacos

[0] http://www.opengroup.org/onlinepubs/...l#tag_15_02_02
 
Reply With Quote
 
Andrea Crotti
Guest
Posts: n/a
 
      07-27-2010
> Unless you can instruct any compiler that will compile the code to lay
> out "stream_t" in a specific way, ie. padding, byte order and
> encoding, the above seems very wrong. The usual steps should be:
>
> - architecture-dependent C language structs
> - architecture-independent, serialized wire format ("byte array")
> - compressed bit-stream
>
> Your code seems to lack the middle phase when providing input for
> compression, and then it seems to store the compressed bit-stream
> erroneously. The latter can't be treated as anything else than "array
> of char unsigned", so the explicit pointer conversion when assigning
> to "out" ("next_out") is immediately suspicious.
>
> Furthermore, you should avoid defining type names ending with "_t" if
> you're on a POSIX/UNIX platform [0].
>
> lacos
>
> [0] http://www.opengroup.org/onlinepubs/...l#tag_15_02_02


Well I understood that those types while compatible are just how
represent the data, so it should not be a problem.

I'm not sure why I need the second phase at all... And more important
what does it mean practically in code?

After all stream_t has to be put as a payload of some network packets
and then reconstructed in the other side (which works nicely now without
the compression at least).

I had done some stupid mistakes before, now it seems to work, I
generate random data, I compress it, decompress and assert that it's
still the same.

Thanks a lot!
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off




Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57