Velocity Reviews > How to read an integer one by one?

# How to read an integer one by one?

Skarmander
Guest
Posts: n/a

 12-08-2005
Richard Heathfield wrote:
> Skarmander said:
>
>> Dag-Erling Smørgrav wrote:
>>> This one is slightly wasteful of memory, but is computed entirely at
>>> compile time:
>>>
>>> #include <limits.h>
>>> char number[1+(sizeof(int)*CHAR_BIT)/3+1];

>
> That looks familiar, but... well, enough of that later.
>
>>> The resulting array will be large enough to store the null-terminated
>>> decimal representation of any integer in the range (INT_MIN,INT_MAX).
>>> Proof of this is left as an exercise for the reader.

>> Actually, please show why this works, especially why this works for
>> INT_MIN. It's easy enough to see N / 3 + 1 can approximate
>> ceil(log10(2^N)), but I can't get the details quite right for signed
>> integers. I'm sure I'm overlooking something obvious.

>
> I can explain it easily enough, for the simple reason that I invented it.
> (I'm quite sure there are other people who've invented it too, and
> long before I did, but at any rate I derived it independently, and so I
> know why it works.)
>
> An int comprises sizeof(int) * CHAR_BIT bits. Since three bits can always
> represent any octal digit, and leaving signs and terminators aside for a
> second, it is clear that (sizeof(int) * CHAR_BIT) / 3 characters are
> sufficient to store the octal representation of the number (but see below).
> Since decimal can represent more efficiently than octal, what's good enough
> for octal is also good enough for decimal.
>
> Now we add in 1 for the sign, and 1 for the null terminator, and that's
> where the above expression comes from.
>

Interesting. Yes, that's probably more straightforward than the brute force
I applied: to represent a number n in a base b, you need ceil(log_b(|n|))
digits (for n != 0). sizeof(int) * CHAR_BIT gives us an upper bound for the
value bits of an integer, and hence an upper bound for (U)INT_MAX: 2^N with
N = sizeof(int) * CHAR_BIT.

Now to represent N-bit integers in decimal, we need at most ceil(log10(2^N))
= ceil(N * log10(2)) = floor(N * log10(2)) + 1 = N / log2(10) + 1 bits. "3"
(log2() is exactly right for octal, and as you say, good enough for
decimal. Better approximations for log2(10) give closer bounds. This does

> BUT: consider the possibility that the integer division truncates (which it
> will do if sizeof(int) * CHAR_BIT is not a multiple of 3). Under such
> circumstances, you could be forgiven for wanting some bodge factor in
> there. That's why I use:
>
> char number[1 + (sizeof(int) * CHAR_BIT + 2) / 3 + 1];
>
> The + 2 is always sufficient to counter the effects of any truncation after
> division by 3, but doesn't inflate the result by more than one character at
> most.
>

<snip>
Yes, and I won't dispute the practicality of this approach as opposed to my
mysterious logarithm approximations, but the challenge is this: prove that
no bodge factor is necessary (it isn't). sizeof(int) * CHAR_BIT / 3 + 1
always gives enough space for all digits and a sign if an integer type is
greater than 11 bits, which it's guaranteed to be (except for char, of
course, and you do need correction for signed char if CHAR_BIT is .

If I'm not mistaken, this is the same as proving that ceil((N - 1) *
log10(2) + 1) <= N / 3 for all N >= 12 (one value bit less for signed
integers, but one character more). Like I said, I'm sure I'm overlooking
something simple, some transformation that will highlight this.

I guess I should have paid better attention in math class. I like to stick
to boolean algebra...

S.

=?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?=
Guest
Posts: n/a

 12-08-2005
Skarmander <(E-Mail Removed)> writes:
> Dag-Erling Smørgrav wrote:
> > #include <limits.h>
> > char number[1+(sizeof(int)*CHAR_BIT)/3+1];
> > The resulting array will be large enough to store the null-terminated
> > decimal representation of any integer in the range (INT_MIN,INT_MAX).
> > Proof of this is left as an exercise for the reader.

> Actually, please show why this works, especially why this works for
> INT_MIN. It's easy enough to see N / 3 + 1 can approximate
> ceil(log10(2^N)), but I can't get the details quite right for signed
> integers. I'm sure I'm overlooking something obvious.

There are three terms in the addition.

DES
--
Dag-Erling Smørgrav - http://www.velocityreviews.com/forums/(E-Mail Removed)

Skarmander
Guest
Posts: n/a

 12-08-2005
Dag-Erling Smørgrav wrote:
> Skarmander <(E-Mail Removed)> writes:
>> Dag-Erling Smørgrav wrote:
>>> #include <limits.h>
>>> char number[1+(sizeof(int)*CHAR_BIT)/3+1];
>>> The resulting array will be large enough to store the null-terminated
>>> decimal representation of any integer in the range (INT_MIN,INT_MAX).
>>> Proof of this is left as an exercise for the reader.

>> Actually, please show why this works, especially why this works for
>> INT_MIN. It's easy enough to see N / 3 + 1 can approximate
>> ceil(log10(2^N)), but I can't get the details quite right for signed
>> integers. I'm sure I'm overlooking something obvious.

>
> There are three terms in the addition.
>

So sizeof(int) * CHAR_BIT / 3 is the term supposed to approximate the number
of digits, and the +1 is for the sign? Hm. Crude.

S.

=?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?=
Guest
Posts: n/a

 12-08-2005
Richard Heathfield <(E-Mail Removed)> writes:
> I can explain it easily enough, for the simple reason that I
> invented it.

That's a bit presumptive, don't you think? It's an obvious solution
to anyone who understands logarithms.

> An int comprises sizeof(int) * CHAR_BIT bits. Since three bits can
> always represent any octal digit, and leaving signs and terminators
> aside for a second, it is clear that (sizeof(int) * CHAR_BIT) / 3
> characters are sufficient to store the octal representation of the
> number (but see below). Since decimal can represent more
> efficiently than octal, what's good enough for octal is also good
> enough for decimal.

Octal never entered my mind; 3 is simply a good enough approximation
of log2(10).

> BUT: consider the possibility that the integer division truncates
> (which it will do if sizeof(int) * CHAR_BIT is not a multiple of
> 3). Under such circumstances, you could be forgiven for wanting some
> bodge factor in there. That's why I use:
>
> char number[1 + (sizeof(int) * CHAR_BIT + 2) / 3 + 1];

Turns out the bodge factor is never necessary on a two's complement
machine.

DES
--
Dag-Erling Smørgrav - (E-Mail Removed)

Jordan Abel
Guest
Posts: n/a

 12-08-2005
On 2005-12-08, Dag-Erling Smørgrav <(E-Mail Removed)> wrote:
> Turns out the bodge factor is never necessary on a two's complement
> machine.

two's complement has nothing to do with it.

take 32 bits.

32/3=10

your assertion implies [essentially] that 10 digits is sufficient for
the _octal_ representation of 2^32-1, which is 11 digits: 37777777777.

For n < 9, ceil(log[10](2^n)) exceeds floor(log[8](2^n))

this means if you try it with char rather than int, you'll get screwed.

on an 8-bit char system, CHAR_BIT/3+2=4, which is one fewer byte than
needed for SCHAR_MIN, "-128" (plus null terminator).

Richard Heathfield
Guest
Posts: n/a

 12-09-2005
Dag-Erling Smørgrav said:

> Richard Heathfield <(E-Mail Removed)> writes:
>> I can explain it easily enough, for the simple reason that I
>> invented it.

>
> That's a bit presumptive, don't you think?

Yes. Hence the qualification, which you snipped.

> It's an obvious solution to anyone who understands logarithms.

Lots of things are obvious even to people who don't. That doesn't stop
software companies from patenting them.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

=?iso-8859-1?q?Dag-Erling_Sm=F8rgrav?=
Guest
Posts: n/a

 12-09-2005
Jordan Abel <(E-Mail Removed)> writes:
> this means if you try it with char rather than int, you'll get screwed.

I never intended it to be used for char. I was relying on the fact
that int cannot be smaller than 16 bits.

DES
--
Dag-Erling Smørgrav - (E-Mail Removed)

Jordan Abel
Guest
Posts: n/a

 12-09-2005
On 2005-12-09, Dag-Erling Smørgrav <(E-Mail Removed)> wrote:
> Jordan Abel <(E-Mail Removed)> writes:
>> this means if you try it with char rather than int, you'll get screwed.

>
> I never intended it to be used for char. I was relying on the fact
> that int cannot be smaller than 16 bits.

well, it still has nothing to do with twos-complement. The extra factor
is still useful if you intend to use the same buffer for octal - and
it's practically free since it gets calculated at compile time

>
> DES