Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Re: Data compression, date range

Reply
Thread Tools

Re: Data compression, date range

 
 
Michael Foukarakis
Guest
Posts: n/a
 
      02-22-2010
On Feb 22, 11:56*am, Branimir Maksimovic <(E-Mail Removed)> wrote:
> given
>
> const uint64_t maxDateValue = 6074000998ull;
>
> write decode/encode date range functions
>
> uint64_t encodeDate(uint64_t start, uint64_t stop);
> void decodeDate(uint64_t blob, uint64_t* start, uint64_t* stop);
>
> Interesting exercise


Where's the exercise? In trying to figure out what the exercise is?
 
Reply With Quote
 
 
 
 
Eric Sosman
Guest
Posts: n/a
 
      02-22-2010
On 2/22/2010 10:36 AM, Branimir Maksimovic wrote:
> Michael Foukarakis wrote:
>> On Feb 22, 11:56 am, Branimir Maksimovic <(E-Mail Removed)> wrote:
>>> given
>>>
>>> const uint64_t maxDateValue = 6074000998ull;
>>>
>>> write decode/encode date range functions
>>>
>>> uint64_t encodeDate(uint64_t start, uint64_t stop);
>>> void decodeDate(uint64_t blob, uint64_t* start, uint64_t* stop);
>>>
>>> Interesting exercise

>>
>> Where's the exercise? In trying to figure out what the exercise is?

> Exercise is to store from / to date in unsigned 64 bit integer,
> and retrieve it back. Constant is maximum number of ranges,
> that is distance from / to , that can be stored in uint64_t.
> Actually, largest range that can be stored is 0-6074000997ull.
> I just wanted to simplify spec as in original spec that constant wasn't
> given.
> I have solution, but will post it next Monday
> Date is eg number of seconds or integer mapped to date, anything...


Are we allowed to assume anything about the values of
start and stop? If 0 <= start <= stop < maxDateValue, for
example, the problem is trivial. With other conditions it
might be harder. (With only 0 <= start < maxDateValue and
0 <= stop < maxDateValue, there's no reversible encoding.)

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)lid
 
Reply With Quote
 
 
 
 
Ersek, Laszlo
Guest
Posts: n/a
 
      02-22-2010
In article <(E-Mail Removed)>, Michael Foukarakis <(E-Mail Removed)> writes:
> On Feb 22, 11:56=A0am, Branimir Maksimovic <(E-Mail Removed)> wrote:
>> given
>>
>> const uint64_t maxDateValue =3D 6074000998ull;
>>
>> write decode/encode date range functions
>>
>> uint64_t encodeDate(uint64_t start, uint64_t stop);
>> void decodeDate(uint64_t blob, uint64_t* start, uint64_t* stop);
>>
>> Interesting exercise

>
> Where's the exercise? In trying to figure out what the exercise is?



I guess the exercise is:

The following inequalities hold for the "start" and "stop" params of
encodeDate():

0 <= start <= stop <= 6074000998

Compress the (start, stop) pair into a single uint64_t value, so that
decodeDate() can reproduce the individual components later.


--o--


6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
contains around 32.5 bits of information. If we wanted to store two such
*unrelated* numbers, we couldn't do that. However, we can throw away one
bit of information, because we can restore the ordering (a yes/no
decision for two numbers) without transmitting it explicitly.

Thus 32.5 + 31.5 = 64.


--o--


Let base B be decimal 3,037,000,499. The greatest base-B number
representable as an uint64_t is

2 | 3 | 2,746,052,116

(digits written in decimal).

Both start and stop are not greater than "2 | 0" in base B -- see
maxDateValue.

start = t | u (base B)
stop = v | w (base B)

The value sent over the wire will look this way in base-B:

case | w | u

Cases:

stop start w < u implies this case

v|w t|u

2|0 2|0 2
2|0 1|u * 2
2|0 0|u * 1
1|w 1|u 1
1|w 0|u * 0
0|w 0|u 0


Notice that where "w < u" does NOT hold, there the case digit equals
both t and v. Also notice that where the case digit is 2, there "w" is
0, thus "case | w | u" (base-B) will not exceed the limit described
above.

Implementation:


static const uint64_t base = UINT64_C(3037000499);

uint64_t
encodeDate(uint64_t start, uint64_t stop)
{
uint64_t
t = start / base,
u = start % base,
v = stop / base,
w = stop % base;

return
w < u
? (
2u == v
? base * base * (t + 1u) + u /* t+1 | 0 | u */
: w * base + u /* 0 | w | u */
)
: base * base * t /* t | w | u */
+ base * w
+ u;
}


void
decodeDate(uint64_t blob, uint64_t* start, uint64_t* stop)
{
uint64_t case = blob / (base * base);

/* Initialize final values with low base-B digits. */
*stop = blob / base % base; /* w */
*start = blob % base; /* u */

/* Set high base-B digit. */
if (*stop < *start) {
switch (case) {
case 2:
*start += base;

case 1:
*stop += base;

case 0:
*stop += base;
}
}
else {
*start += case * base;
*stop += case * base;
}
}


Cheers,
lacos
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      02-22-2010
(E-Mail Removed) (Ersek, Laszlo) writes:
[...]
> 6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
> contains around 32.5 bits of information. If we wanted to store two such
> *unrelated* numbers, we couldn't do that. However, we can throw away one
> bit of information, because we can restore the ordering (a yes/no
> decision for two numbers) without transmitting it explicitly.
>
> Thus 32.5 + 31.5 = 64.

[...]

Now that I understand the problem, it becomes clear that the value
6074000998 was chosen very carefully so it's very close to 2**32.5
(or, equivalently, 2**32 * sqrt(2)). Of course I'm using "**"
as an exponentiation operator and being sloppy with types; these
are mathematical expressions, not C expressions.

2.0**32.5 is approximately 6074000999.9521.
log2(607400099 is approximately 32.49999999953634.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Ersek, Laszlo
Guest
Posts: n/a
 
      02-22-2010
In article <(E-Mail Removed)>, Keith Thompson <(E-Mail Removed)> writes:
> (E-Mail Removed) (Ersek, Laszlo) writes:
> [...]
>> 6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
>> contains around 32.5 bits of information. If we wanted to store two such
>> *unrelated* numbers, we couldn't do that. However, we can throw away one
>> bit of information, because we can restore the ordering (a yes/no
>> decision for two numbers) without transmitting it explicitly.
>>
>> Thus 32.5 + 31.5 = 64.

> [...]
>
> Now that I understand the problem, it becomes clear that the value
> 6074000998 was chosen very carefully so it's very close to 2**32.5
> (or, equivalently, 2**32 * sqrt(2)). Of course I'm using "**"
> as an exponentiation operator and being sloppy with types; these
> are mathematical expressions, not C expressions.
>
> 2.0**32.5 is approximately 6074000999.9521.
> log2(607400099 is approximately 32.49999999953634.


Yes, I'm also curious where this problem comes from.

Branimir, was it a school / academic problem, or was the integer
6074000998 a hint from an already solved, bigger industrial problem (ie.
"what is the greatest integer maxDateValue, so that...")?

Thanks,
lacos
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      02-22-2010
On 2/22/2010 5:28 PM, Branimir Maksimovic wrote:
> Ersek, Laszlo wrote:
>> In article <(E-Mail Removed)>, Keith Thompson
>> <(E-Mail Removed)> writes:
>>> (E-Mail Removed) (Ersek, Laszlo) writes:
>>> [...]
>>>> 6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
>>>> contains around 32.5 bits of information. If we wanted to store two
>>>> such
>>>> *unrelated* numbers, we couldn't do that. However, we can throw away
>>>> one
>>>> bit of information, because we can restore the ordering (a yes/no
>>>> decision for two numbers) without transmitting it explicitly.
>>>>
>>>> Thus 32.5 + 31.5 = 64.
>>> [...]
>>>
>>> Now that I understand the problem, it becomes clear that the value
>>> 6074000998 was chosen very carefully so it's very close to 2**32.5
>>> (or, equivalently, 2**32 * sqrt(2)). Of course I'm using "**"
>>> as an exponentiation operator and being sloppy with types; these
>>> are mathematical expressions, not C expressions.
>>>
>>> 2.0**32.5 is approximately 6074000999.9521.
>>> log2(607400099 is approximately 32.49999999953634.

>>
>> Yes, I'm also curious where this problem comes from.
>>
>> Branimir, was it a school / academic problem, or was the integer
>> 6074000998 a hint from an already solved, bigger industrial problem (ie.
>> "what is the greatest integer maxDateValue, so that...")?
>>

>
> I don;t know somebody asked question on local programming forum,
> and lot of us tried to solve it. I was best with O(1),
> solution with second place guy logn solution.
> It was interesting that one guy invented complex
> compression formula which worked like 10 minutes to
> pack single range


Unbelievable! There are obviously N * (N+1) / 2 integers
X,Y satisfying 0 <= X <= Y < N. Could nobody think of an easy
way to make the transformation?

See also "triangular matrix." Don't the kids learn anything
these days?

--
Eric Sosman
(E-Mail Removed)lid
 
Reply With Quote
 
lovecreatesbeauty@gmail.c0m
Guest
Posts: n/a
 
      02-23-2010
On Feb 22, 5:22*pm, (E-Mail Removed) (Ersek, Laszlo) wrote:

> 6074000998 is 1,01101010,00001001,11100110,01100110 in binary. It
> contains around 32.5 bits of information. If we wanted to store two such
> *unrelated* numbers, we couldn't do that. However, we can throw away one
> bit of information, because we can restore the ordering (a yes/no
> decision for two numbers) without transmitting it explicitly.
>


can't understand, how can one manage to restore bits one by one
properly. which bit belongs to x and which belongs to y? your code
doesn't show this procedure thanks.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Data compression, date range Ersek, Laszlo C Programming 3 02-24-2010 02:04 AM
Date, date date date.... Peter Grison Java 10 05-30-2004 01:20 PM
Given a date, how to find the beginning date and ending date of that week Matt ASP .Net 1 11-08-2003 09:14 PM
Given a date, how to find the beginning date and ending date of that week Matt C Programming 3 11-08-2003 09:07 PM
Given a date, how to find the beginning date and ending date of that week Matt C++ 2 11-08-2003 08:30 PM



Advertisments