Velocity Reviews > Re: Data compression, date range

# Re: Data compression, date range

Michael Foukarakis
Guest
Posts: n/a

 02-22-2010
On Feb 22, 11:56*am, Branimir Maksimovic <(E-Mail Removed)> wrote:
> given
>
> const uint64_t maxDateValue = 6074000998ull;
>
> write decode/encode date range functions
>
> uint64_t encodeDate(uint64_t start, uint64_t stop);
> void decodeDate(uint64_t blob, uint64_t* start, uint64_t* stop);
>
> Interesting exercise

Where's the exercise? In trying to figure out what the exercise is?

Eric Sosman
Guest
Posts: n/a

 02-22-2010
On 2/22/2010 10:36 AM, Branimir Maksimovic wrote:
> Michael Foukarakis wrote:
>> On Feb 22, 11:56 am, Branimir Maksimovic <(E-Mail Removed)> wrote:
>>> given
>>>
>>> const uint64_t maxDateValue = 6074000998ull;
>>>
>>> write decode/encode date range functions
>>>
>>> uint64_t encodeDate(uint64_t start, uint64_t stop);
>>> void decodeDate(uint64_t blob, uint64_t* start, uint64_t* stop);
>>>
>>> Interesting exercise

>>
>> Where's the exercise? In trying to figure out what the exercise is?

> Exercise is to store from / to date in unsigned 64 bit integer,
> and retrieve it back. Constant is maximum number of ranges,
> that is distance from / to , that can be stored in uint64_t.
> Actually, largest range that can be stored is 0-6074000997ull.
> I just wanted to simplify spec as in original spec that constant wasn't
> given.
> I have solution, but will post it next Monday
> Date is eg number of seconds or integer mapped to date, anything...

Are we allowed to assume anything about the values of
start and stop? If 0 <= start <= stop < maxDateValue, for
example, the problem is trivial. With other conditions it
might be harder. (With only 0 <= start < maxDateValue and
0 <= stop < maxDateValue, there's no reversible encoding.)

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)lid

Ersek, Laszlo
Guest
Posts: n/a

 02-22-2010
In article <(E-Mail Removed)>, Michael Foukarakis <(E-Mail Removed)> writes:
> On Feb 22, 11:56=A0am, Branimir Maksimovic <(E-Mail Removed)> wrote:
>> given
>>
>> const uint64_t maxDateValue =3D 6074000998ull;
>>
>> write decode/encode date range functions
>>
>> uint64_t encodeDate(uint64_t start, uint64_t stop);
>> void decodeDate(uint64_t blob, uint64_t* start, uint64_t* stop);
>>
>> Interesting exercise

>
> Where's the exercise? In trying to figure out what the exercise is?

I guess the exercise is:

The following inequalities hold for the "start" and "stop" params of
encodeDate():

0 <= start <= stop <= 6074000998

Compress the (start, stop) pair into a single uint64_t value, so that
decodeDate() can reproduce the individual components later.

--o--

6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
contains around 32.5 bits of information. If we wanted to store two such
*unrelated* numbers, we couldn't do that. However, we can throw away one
bit of information, because we can restore the ordering (a yes/no
decision for two numbers) without transmitting it explicitly.

Thus 32.5 + 31.5 = 64.

--o--

Let base B be decimal 3,037,000,499. The greatest base-B number
representable as an uint64_t is

2 | 3 | 2,746,052,116

(digits written in decimal).

Both start and stop are not greater than "2 | 0" in base B -- see
maxDateValue.

start = t | u (base B)
stop = v | w (base B)

The value sent over the wire will look this way in base-B:

case | w | u

Cases:

stop start w < u implies this case

v|w t|u

2|0 2|0 2
2|0 1|u * 2
2|0 0|u * 1
1|w 1|u 1
1|w 0|u * 0
0|w 0|u 0

Notice that where "w < u" does NOT hold, there the case digit equals
both t and v. Also notice that where the case digit is 2, there "w" is
0, thus "case | w | u" (base-B) will not exceed the limit described
above.

Implementation:

static const uint64_t base = UINT64_C(3037000499);

uint64_t
encodeDate(uint64_t start, uint64_t stop)
{
uint64_t
t = start / base,
u = start % base,
v = stop / base,
w = stop % base;

return
w < u
? (
2u == v
? base * base * (t + 1u) + u /* t+1 | 0 | u */
: w * base + u /* 0 | w | u */
)
: base * base * t /* t | w | u */
+ base * w
+ u;
}

void
decodeDate(uint64_t blob, uint64_t* start, uint64_t* stop)
{
uint64_t case = blob / (base * base);

/* Initialize final values with low base-B digits. */
*stop = blob / base % base; /* w */
*start = blob % base; /* u */

/* Set high base-B digit. */
if (*stop < *start) {
switch (case) {
case 2:
*start += base;

case 1:
*stop += base;

case 0:
*stop += base;
}
}
else {
*start += case * base;
*stop += case * base;
}
}

Cheers,
lacos

Keith Thompson
Guest
Posts: n/a

 02-22-2010
(E-Mail Removed) (Ersek, Laszlo) writes:
[...]
> 6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
> contains around 32.5 bits of information. If we wanted to store two such
> *unrelated* numbers, we couldn't do that. However, we can throw away one
> bit of information, because we can restore the ordering (a yes/no
> decision for two numbers) without transmitting it explicitly.
>
> Thus 32.5 + 31.5 = 64.

[...]

Now that I understand the problem, it becomes clear that the value
6074000998 was chosen very carefully so it's very close to 2**32.5
(or, equivalently, 2**32 * sqrt(2)). Of course I'm using "**"
as an exponentiation operator and being sloppy with types; these
are mathematical expressions, not C expressions.

2.0**32.5 is approximately 6074000999.9521.
log2(607400099 is approximately 32.49999999953634.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Ersek, Laszlo
Guest
Posts: n/a

 02-22-2010
In article <(E-Mail Removed)>, Keith Thompson <(E-Mail Removed)> writes:
> (E-Mail Removed) (Ersek, Laszlo) writes:
> [...]
>> 6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
>> contains around 32.5 bits of information. If we wanted to store two such
>> *unrelated* numbers, we couldn't do that. However, we can throw away one
>> bit of information, because we can restore the ordering (a yes/no
>> decision for two numbers) without transmitting it explicitly.
>>
>> Thus 32.5 + 31.5 = 64.

> [...]
>
> Now that I understand the problem, it becomes clear that the value
> 6074000998 was chosen very carefully so it's very close to 2**32.5
> (or, equivalently, 2**32 * sqrt(2)). Of course I'm using "**"
> as an exponentiation operator and being sloppy with types; these
> are mathematical expressions, not C expressions.
>
> 2.0**32.5 is approximately 6074000999.9521.
> log2(607400099 is approximately 32.49999999953634.

Yes, I'm also curious where this problem comes from.

Branimir, was it a school / academic problem, or was the integer
6074000998 a hint from an already solved, bigger industrial problem (ie.
"what is the greatest integer maxDateValue, so that...")?

Thanks,
lacos

Eric Sosman
Guest
Posts: n/a

 02-22-2010
On 2/22/2010 5:28 PM, Branimir Maksimovic wrote:
> Ersek, Laszlo wrote:
>> In article <(E-Mail Removed)>, Keith Thompson
>> <(E-Mail Removed)> writes:
>>> (E-Mail Removed) (Ersek, Laszlo) writes:
>>> [...]
>>>> 6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
>>>> contains around 32.5 bits of information. If we wanted to store two
>>>> such
>>>> *unrelated* numbers, we couldn't do that. However, we can throw away
>>>> one
>>>> bit of information, because we can restore the ordering (a yes/no
>>>> decision for two numbers) without transmitting it explicitly.
>>>>
>>>> Thus 32.5 + 31.5 = 64.
>>> [...]
>>>
>>> Now that I understand the problem, it becomes clear that the value
>>> 6074000998 was chosen very carefully so it's very close to 2**32.5
>>> (or, equivalently, 2**32 * sqrt(2)). Of course I'm using "**"
>>> as an exponentiation operator and being sloppy with types; these
>>> are mathematical expressions, not C expressions.
>>>
>>> 2.0**32.5 is approximately 6074000999.9521.
>>> log2(607400099 is approximately 32.49999999953634.

>>
>> Yes, I'm also curious where this problem comes from.
>>
>> Branimir, was it a school / academic problem, or was the integer
>> 6074000998 a hint from an already solved, bigger industrial problem (ie.
>> "what is the greatest integer maxDateValue, so that...")?
>>

>
> I don;t know somebody asked question on local programming forum,
> and lot of us tried to solve it. I was best with O(1),
> solution with second place guy logn solution.
> It was interesting that one guy invented complex
> compression formula which worked like 10 minutes to
> pack single range

Unbelievable! There are obviously N * (N+1) / 2 integers
X,Y satisfying 0 <= X <= Y < N. Could nobody think of an easy
way to make the transformation?

these days?

--
Eric Sosman
(E-Mail Removed)lid

lovecreatesbeauty@gmail.c0m
Guest
Posts: n/a

 02-23-2010
On Feb 22, 5:22*pm, (E-Mail Removed) (Ersek, Laszlo) wrote:

> 6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
> contains around 32.5 bits of information. If we wanted to store two such
> *unrelated* numbers, we couldn't do that. However, we can throw away one
> bit of information, because we can restore the ordering (a yes/no
> decision for two numbers) without transmitting it explicitly.
>

can't understand, how can one manage to restore bits one by one
properly. which bit belongs to x and which belongs to y? your code
doesn't show this procedure thanks.