Velocity Reviews > inaccuracies in compression algorithm

# inaccuracies in compression algorithm

cornelis van der bent
Guest
Posts: n/a

 07-23-2009
I have a limited range of positive integer values that occur all over
my data to be compressed. After decompression these value may have
changed a little, +/- a fixed percentage. So we're talking about
lossy [de]compression here.

My idea is/was to compress a value by taking its log() and scaling
this up by a factor. The factor being just large enough to get an
integer compressed value with which the original value can be
calculated, give/take the allowed error percentage. I also subtract
an offset from the log() result to let my compressed range start at 0;
this to increase compression even further. The offset is log(<lowest-
value-in-range>).

By just including the above mentioned 'factor' and 'offset' in the
data, I can decompress these values.

Here's my code (I've copy pasted most from working source but typed
the test() routine here, so please forgive typos):

static double factor;
static double offset;

void test(int minValue, int maxValue, int errorPercentage)
{
double resulution;
int n;

resolution = log(maxValue * (1.0 + (errorPercentage / 100.0))) -
log(maxValue * (1.0 - (errorPercentage / 100.0)));

factor = 1 / resolution;

offset = log((double)minValue);

for (n = minValue; n < maxValue; n++)
{
int down = scaler->scaleDown(n);
int up = scaler->scaleUp(down);

printf("%4d => %d => %d => %.2f\n", n, down, up, ((100.0 * (up
- n)) / (double)n ));
}
}

int scaleDown(int originalValue)
{
return (int)round((log((double)originalValue) - offset) * factor);
}

int scaleUp(int scaledValue)
{
return (int)round(exp(((double)scaledValue / factor) + offset));
}

When running test(200, 2000, 20), I see that the error is asymetrical:
Given a certain compressed value, lowest input values have > 20%
error, while on the other side the highest input values have < 20%
error.
1060 => 12 => 1297 => 22.36
....
1589 => 12 => 1297 => -18.38

Does anyone have an idea what causes this and how to fix it?

Thanks for listening!

Ben Bacarisse
Guest
Posts: n/a

 07-23-2009
cornelis van der bent <(E-Mail Removed)> writes:

> I have a limited range of positive integer values that occur all over
> my data to be compressed. After decompression these value may have
> changed a little, +/- a fixed percentage. So we're talking about
> lossy [de]compression here.
>
> My idea is/was to compress a value by taking its log() and scaling
> this up by a factor. The factor being just large enough to get an
> integer compressed value with which the original value can be
> calculated, give/take the allowed error percentage. I also subtract
> an offset from the log() result to let my compressed range start at 0;
> this to increase compression even further. The offset is log(<lowest-
> value-in-range>).
>
> By just including the above mentioned 'factor' and 'offset' in the
> data, I can decompress these values.
>
> Here's my code (I've copy pasted most from working source but typed
> the test() routine here, so please forgive typos):
>
> static double factor;
> static double offset;
>
> void test(int minValue, int maxValue, int errorPercentage)
> {
> double resulution;
> int n;
>
> resolution = log(maxValue * (1.0 + (errorPercentage / 100.0))) -
> log(maxValue * (1.0 - (errorPercentage / 100.0)));
>
> factor = 1 / resolution;
>
> offset = log((double)minValue);
>
> for (n = minValue; n < maxValue; n++)
> {
> int down = scaler->scaleDown(n);
> int up = scaler->scaleUp(down);

This is not C. Of course, the problem is not caused by this being C++
but then you don't have a C question either (see below).

> printf("%4d => %d => %d => %.2f\n", n, down, up, ((100.0 * (up
> - n)) / (double)n ));
> }
> }
>
> int scaleDown(int originalValue)
> {
> return (int)round((log((double)originalValue) - offset) * factor);
> }
>
> int scaleUp(int scaledValue)
> {
> return (int)round(exp(((double)scaledValue / factor) + offset));
> }
>
> When running test(200, 2000, 20), I see that the error is asymetrical:
> Given a certain compressed value, lowest input values have > 20%
> error, while on the other side the highest input values have < 20%
> error.
> 1060 => 12 => 1297 => 22.36
> ...
> 1589 => 12 => 1297 => -18.38
>
> Does anyone have an idea what causes this and how to fix it?

Two different input values map to the same scaled one (12) and this
maps to the same "uncompressed" output so, of course, the error is not
the same. There is really no answer to "what causes this" other than
the computation causes it. To fix it, use a different computation.

This is not a C question. You will get much better answers in a group
about compression or, failing that, a general programming group like
comp.programming.

--
Ben.

cornelis van der bent
Guest
Posts: n/a

 07-23-2009
On 23 jul, 13:33, Ben Bacarisse <(E-Mail Removed)> wrote:
> Two different input values map to the same scaled one (12) and this
> maps to the same "uncompressed" output so, of course, the error is not
> the same. *There is really no answer to "what causes this" other than
> the computation causes it. *To fix it, use a different computation.
>
> This is not a C question. *You will get much better answers in a group
> about compression or, failing that, a general programming group like
> comp.programming.

You are right Ben! I appologise for the inconvenience. Removed my
message.

Boon
Guest
Posts: n/a

 07-23-2009
cornelis van der bent wrote:

> I have a limited range of positive integer values that occur all over
> my data to be compressed. After decompression these value may have
> changed a little, +/- a fixed percentage. So we're talking about
> lossy [de]compression here.

Try comp.compression

Ben Bacarisse
Guest
Posts: n/a

 07-23-2009
Richard Heathfield <(E-Mail Removed)> writes:

> Ben Bacarisse said:
>
>> cornelis van der bent <(E-Mail Removed)> writes:
>>

> <snip>

<snip>
>>> int down = scaler->scaleDown(n);
>>> int up = scaler->scaleUp(down);

>>
>> This is not C.

>
> From reading the OP's reply to you, I see that you're right (as
> usual), but on this occasion I'm not sure that you deserved to be.
> There is nothing in the above function that is not legal C, as far
> as I can tell. We don't have a definition of scaler, but its type
> could easily be defined along these lines:
>
> struct S
> {
> int (*scaleDown)(int);
> int (*scaleUp(int);
> /* other stuff here */
> };
>
> All we need now for test() to be legal C is struct S scaler; and a
> couple of function pointer assignments elsecode. Since we weren't
> given the whole code, without further info from the OP we have no
> reason to suppose that this isn't the case in the OP's program.

You are quite right. It was just a guess and if I was correct it was
by accident. Of course, if this had been C there is a good chance it
would have been written:

int down = scaler->scaleDown(scaler, n);

but that is no reason to say the code was not C.

--
Ben.

Keith Thompson
Guest
Posts: n/a

 07-23-2009
cornelis van der bent <(E-Mail Removed)> writes:
> On 23 jul, 13:33, Ben Bacarisse <(E-Mail Removed)> wrote:

[...]
>> This is not a C question. Â*You will get much better answers in a group
>> about compression or, failing that, a general programming group like
>> comp.programming.

>
> You are right Ben! I appologise for the inconvenience. Removed my
> message.

FYI, removing a message rarely works. Because cancel messages
(the mechanism by which messages are removed) are often abused,
most servers ignore them. Once you post something, it's probably
there for good.

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed)
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Lew Pitcher
Guest
Posts: n/a

 07-23-2009
On July 23, 2009 09:22, in comp.lang.c, Richard Heathfield
((E-Mail Removed)) wrote:

> Ben Bacarisse said:
>
>> cornelis van der bent <(E-Mail Removed)> writes:
>>

> <snip>
>>>
>>> static double factor;
>>> static double offset;
>>>
>>> void test(int minValue, int maxValue, int errorPercentage)
>>> {
>>> double resulution;
>>> int n;
>>>
>>> resolution = log(maxValue * (1.0 + (errorPercentage /
>>> 100.0))) -
>>> log(maxValue * (1.0 - (errorPercentage /
>>> 100.0)));
>>>
>>> factor = 1 / resolution;
>>>
>>> offset = log((double)minValue);
>>>
>>> for (n = minValue; n < maxValue; n++)
>>> {
>>> int down = scaler->scaleDown(n);
>>> int up = scaler->scaleUp(down);

>>
>> This is not C.

>
> From reading the OP's reply to you, I see that you're right (as
> usual), but on this occasion I'm not sure that you deserved to be.
> There is nothing in the above function that is not legal C, as far
> as I can tell. We don't have a definition of scaler, but its type
> could easily be defined along these lines:
>
> struct S
> {
> int (*scaleDown)(int);
> int (*scaleUp(int);
> /* other stuff here */
> };
>
> All we need now for test() to be legal C is struct S scaler; and a
> couple of function pointer assignments elsecode. Since we weren't
> given the whole code, without further info from the OP we have no
> reason to suppose that this isn't the case in the OP's program.

Had we interpreted the OPs source code as C, I have a question...

The initializer value assigned to the OP's
int up
*depends* on the initializer value assigned to the OPs
int down

The OPs code depends on the order of initialization of two independant
declarations. I can't find anything in 9989-1999 that guarantees that
declarations will be 'executed' (i.e. storage allocated, initializations
applied) in the order that they were declared in.

If the C language does not make such a guarantee, then the OP's code (when
viewed as /C/ code) takes on "undefined behaviour" at the very least, as it
depends on the order of initialization to follow the order imposed in the
source code.

Am I interpreting this correctly? Can anyone point me to the place in
9989-1999 which guarantees that declarations will be 'executed' (in the way
I noted above) in order?

Thanks
--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------

Morris Keesan
Guest
Posts: n/a

 07-23-2009
On Thu, 23 Jul 2009 09:22:59 -0400, Richard Heathfield
<(E-Mail Removed)> wrote:

> Ben Bacarisse said:
>
>> cornelis van der bent <(E-Mail Removed)> writes:

....
>>> int down = scaler->scaleDown(n);
>>> int up = scaler->scaleUp(down);

>>
>> This is not C.

>
> From reading the OP's reply to you, I see that you're right (as
> usual), but on this occasion I'm not sure that you deserved to be.
> There is nothing in the above function that is not legal C, as far
> as I can tell. We don't have a definition of scaler, but its type
> could easily be defined along these lines:
>
> struct S
> {
> int (*scaleDown)(int);
> int (*scaleUp(int);
> /* other stuff here */
> };
>
> All we need now for test() to be legal C is struct S scaler; and a
> couple of function pointer assignments elsecode.

Of course, you meant struct S *scaler, along with creation of a (struct S)
and assignment of its address to scaler.

Phil Carmody
Guest
Posts: n/a

 07-23-2009
Lew Pitcher <(E-Mail Removed)> writes:
> On July 23, 2009 09:22, in comp.lang.c, Richard Heathfield
> ((E-Mail Removed)) wrote:
>
>> Ben Bacarisse said:
>>
>>> cornelis van der bent <(E-Mail Removed)> writes:
>>>

>> <snip>
>>>>
>>>> static double factor;
>>>> static double offset;
>>>>
>>>> void test(int minValue, int maxValue, int errorPercentage)
>>>> {
>>>> double resulution;
>>>> int n;
>>>>
>>>> resolution = log(maxValue * (1.0 + (errorPercentage /
>>>> 100.0))) -
>>>> log(maxValue * (1.0 - (errorPercentage /
>>>> 100.0)));
>>>>
>>>> factor = 1 / resolution;
>>>>
>>>> offset = log((double)minValue);
>>>>
>>>> for (n = minValue; n < maxValue; n++)
>>>> {
>>>> int down = scaler->scaleDown(n);
>>>> int up = scaler->scaleUp(down);
>>>
>>> This is not C.

>>
>> From reading the OP's reply to you, I see that you're right (as
>> usual), but on this occasion I'm not sure that you deserved to be.
>> There is nothing in the above function that is not legal C, as far
>> as I can tell. We don't have a definition of scaler, but its type
>> could easily be defined along these lines:
>>
>> struct S
>> {
>> int (*scaleDown)(int);
>> int (*scaleUp(int);
>> /* other stuff here */
>> };
>>
>> All we need now for test() to be legal C is struct S scaler; and a
>> couple of function pointer assignments elsecode. Since we weren't
>> given the whole code, without further info from the OP we have no
>> reason to suppose that this isn't the case in the OP's program.

>
> Had we interpreted the OPs source code as C, I have a question...
>
> The initializer value assigned to the OP's
> int up
> *depends* on the initializer value assigned to the OPs
> int down
>
> The OPs code depends on the order of initialization of two independant
> declarations. I can't find anything in 9989-1999 that guarantees that
> declarations will be 'executed' (i.e. storage allocated, initializations
> applied) in the order that they were declared in.

Sequence points. The initialisations are ordered.

Phil
--
If GML was an infant, SGML is the bright youngster far exceeds
expectations and made its parents too proud, but XML is the
before he had sex, which was rape. -- Erik Naggum (1965-2009)

Lew Pitcher
Guest
Posts: n/a

 07-23-2009
On July 23, 2009 13:09, in comp.lang.c, Phil Carmody
((E-Mail Removed)) wrote:

> Lew Pitcher <(E-Mail Removed)> writes:

[snip]
>> The OPs code depends on the order of initialization of two independant
>> declarations. I can't find anything in 9989-1999 that guarantees that
>> declarations will be 'executed' (i.e. storage allocated, initializations
>> applied) in the order that they were declared in.

>
> Sequence points. The initialisations are ordered.

I /knew/ that there was something obvious that I was missing. Thanks

--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------