Velocity Reviews > Richard Heathfield's setbits()

# Richard Heathfield's setbits()

Alex Vinokur
Guest
Posts: n/a

 05-02-2010
Hi,

Here is Richard Heathfield's function from http://users.powernet.co.uk/eton/kandr2/krx206.html

#include <stdio.h>
unsigned setbits(unsigned x, int p, int n, unsigned y)
{
return (x & ((~0 << (p + 1)) | (~(~0 << (p + 1 - n))))) | ((y & ~(~0
<< n)) << (p + 1 - n));
}

setbits(x,p,n,y) returns x with the n bits that begin at position p
set to the rightmost n bits of y, leaving the other bits unchanged.

It seems that setbits (x, 31, n, y) may produce undefined behavior.

According to the C++ standard:
Behavior of shift operators << and >> is undefined if the right
operand is negative, or
greater than or equal to the length in bits of the promoted left
operand.

So, result ((~0 << (p + 1)) may be undefined.

Regards,

Alex Vinokur

Ben Bacarisse
Guest
Posts: n/a

 05-02-2010
Alex Vinokur <(E-Mail Removed)> writes:

> Here is Richard Heathfield's function from http://users.powernet.co.uk/eton/kandr2/krx206.html
>
> #include <stdio.h>
> unsigned setbits(unsigned x, int p, int n, unsigned y)
> {
> return (x & ((~0 << (p + 1)) | (~(~0 << (p + 1 - n))))) | ((y & ~(~0
> << n)) << (p + 1 - n));
> }
>
> setbits(x,p,n,y) returns x with the n bits that begin at position p
> set to the rightmost n bits of y, leaving the other bits unchanged.
>
> It seems that setbits (x, 31, n, y) may produce undefined behavior.

Yes, it does.

> According to the C++ standard:

K&R is about C so you should probably quote the C standard. The intent
matter much, but a solution to a K&R exercise must be assumed to be C.
As it happens, modern C (C99) has diverged from C++ in the case of left
shifting negative numbers but the above is, presumably, not C99.

> Behavior of shift operators << and >> is undefined if the right
> operand is negative, or
> greater than or equal to the length in bits of the promoted left
> operand.
>
> So, result ((~0 << (p + 1)) may be undefined.

It can be undefined for other reasons, though none that matter in the
context of K&R2. ~0 is often signed and negative in which case it is
undefined in C99 but not in C90 or in C++ (at least up to and including
2003). It's safer to shift unsigned types so I'd suggest ~0u.

I think it's hard to do this without knowing the width of the type. I'd
probably write:

unsigned width = CHAR_BIT * sizeof x;
unsigned mask = ~0u >> width - n << p - n + 1;

This goes wrong if there are padding bits, but at least we can check for
that (UINT_MAX will be == ~0u if there are none).

There is a bit-twiddling version that is one operation shorter:

return x ^ (x ^ a) & mask;

but you'd need to justify the "eh?" this might prompt in the reader.

--
Ben.

Jonathan Lee
Guest
Posts: n/a

 05-02-2010
On May 2, 7:46*am, Alex Vinokur <(E-Mail Removed)> wrote:
> Here is Richard Heathfield's function
> ...
> setbits(x,p,n,y) returns x with the n bits that begin at position p
> set to the rightmost n bits of y, leaving the other bits unchanged.
>
> It seems that setbits (x, 31, n, y) may produce undefined behavior.

Assuming p, n are in [0, INT_BIT)

unsigned mask = (~((~0U) << n)) << p;
or
return x ^ ((x ^ (y << p)) & mask)

--Jonathan

Eric Sosman
Guest
Posts: n/a

 05-02-2010
On 5/2/2010 11:39 AM, Ben Bacarisse wrote:
> [...]
> ~0 is often signed and negative [...]

In C, s/often/always/.

> [...]
> This goes wrong if there are padding bits, but at least we can check for
> that (UINT_MAX will be == ~0u if there are none).

Also if there are twenty. 6.5.3.3p4: "[...] If the promoted
type is an unsigned type, the expression ~E is equivalent to the
maximum value representable in that type minus E." As always, the
settings of any padding bits in the result of ~E (of any arithmetic
operation) are unspecified.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)lid

Ben Bacarisse
Guest
Posts: n/a

 05-02-2010
Eric Sosman <(E-Mail Removed)> writes:

> On 5/2/2010 11:39 AM, Ben Bacarisse wrote:
>> [...]
>> ~0 is often signed and negative [...]

>
> In C, s/often/always/.

I'll take your word for it! I was not sure about ~0 on a 1's complement
machine that supports negative zero. It's called "negative" but is it
less than zero for the purposes of a shift operation? I was not sure.

>> [...]
>> This goes wrong if there are padding bits, but at least we can check for
>> that (UINT_MAX will be == ~0u if there are none).

>
> Also if there are twenty. 6.5.3.3p4: "[...] If the promoted
> type is an unsigned type, the expression ~E is equivalent to the
> maximum value representable in that type minus E." As always, the
> settings of any padding bits in the result of ~E (of any arithmetic
> operation) are unspecified.

Ah, yes, of course. Do you know a good way to determine the width of an
unsigned type? By "good" I probably mean "other than the obvious"
iterative one.

--
Ben.

Eric Sosman
Guest
Posts: n/a

 05-02-2010
On 5/2/2010 2:15 PM, Ben Bacarisse wrote:
> Eric Sosman<(E-Mail Removed)> writes:
>
>> On 5/2/2010 11:39 AM, Ben Bacarisse wrote:
>>> [...]
>>> ~0 is often signed and negative [...]

>>
>> In C, s/often/always/.

>
> I'll take your word for it! I was not sure about ~0 on a 1's complement
> machine that supports negative zero. It's called "negative" but is it
> less than zero for the purposes of a shift operation? I was not sure.

Ah! Okay, "negative zero" might not be "negative" (since it
compares equal to "positive zero"). Point taken.

>>> [...]
>>> This goes wrong if there are padding bits, but at least we can check for
>>> that (UINT_MAX will be == ~0u if there are none).

>>
>> Also if there are twenty. 6.5.3.3p4: "[...] If the promoted
>> type is an unsigned type, the expression ~E is equivalent to the
>> maximum value representable in that type minus E." As always, the
>> settings of any padding bits in the result of ~E (of any arithmetic
>> operation) are unspecified.

>
> Ah, yes, of course. Do you know a good way to determine the width of an
> unsigned type? By "good" I probably mean "other than the obvious"
> iterative one.

Hallvard B. Furuseth came up with

"

/* Number of bits in inttype_MAX, or in any (1<<k)-1 where
* 0 <= k < 3.2E+10 */
#define IMAX_BITS(m) ((m) /((m)%0x3fffffffL+1) /0x3fffffffL %0x3fffffffL
*30 \
+ (m)%0x3fffffffL /((m)%31+1)/31%31*5 +
4-12/((m)%31+3))

Or if you are less paranoid about how large UINTMAX_MAX can get:

/* Number of bits in inttype_MAX, or in any (1<<k)-1 where 0 <= k < 2040 */
#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))

.." (Sorry about the line-wrapping.) Dunno whether you'd deem this
good, but it's certainly a jaw-dropper.

--
Eric Sosman
(E-Mail Removed)lid

Ersek, Laszlo
Guest
Posts: n/a

 05-02-2010
On Sun, 2 May 2010, Ben Bacarisse wrote:

> Do you know a good way to determine the width of an unsigned type? By
> "good" I probably mean "other than the obvious" iterative one.

You can count the number of set bits in logarithmic time. (*)

Counting the bits set in (uintmax_t)(unsigned_type)-1 should give the
number of value bits in "unsigned_type". It should cover all unsigned
types, and it should require not more steps than log2(sizeof(uintmax_t) *
(size_t)CHAR_BIT) rounded up or so.

(*) To see this, divide the entire bit-string in adjacent bit-pairs. A
bit-pair can have 0, 1, or 2 bits set. This sum can be represented in two
bits as 00_b, 01_b, or 10_b; there is no carry from this addition. Then
you can add adjacent bit-pairs representing the sums, in order to extend
the sum to the originally adjacent bit-pair. The highest value will be
4_d, or 0100_b, so it will fit into each four-bit nibble. And so on. You
basically do multiple additions at once. Disjunct bit-segments can be used
in parallel, because there is never a carry. As you double the segment
length in each step, the representible range is squared per segment, but
the maximum sum is only doubled; double exponential vs. exponential.

For uin32_t, it should look something like this:

unsigned
countbits32(uint32_t u32)
{
/*
Compute initial sums. A single bit is identical to the number of bits
set in it.
*/

/* 10101010 ... 01010101 ... */
u32 = (u32 & 0xAAAAAAAAlu) >> 1 + (u32 & 0x55555555lu);

/* 11001100 ... 00110011 ... */
u32 = (u32 & 0xCCCCCCCClu) >> 2 + (u32 & 0x33333333lu);

/* 11110000 ... 00001111 ... */
u32 = (u32 & 0xF0F0F0F0lu) >> 4 + (u32 & 0x0F0F0F0Flu);

u32 = (u32 & 0xFF00FF00lu) >> 8 + (u32 & 0x00FF00FFlu);
u32 = (u32 & 0xFFFF0000lu) >> 16 + (u32 & 0x0000FFFFlu);
}

(I probably broke this, so caveat emptor.)

Another way to look at this is a complete binary tree of six levels (five
levels plus root). The leaves are the individual original bits (also
representing the number of bits set in each individual bit). Each internal
node sums the value of its two direct children, representing the number of
all nonzero leaves of the subtree rooted at that node. Leveling up happens
in parallel.

(So much ranting about such a small thing, and even so, probably
incorrect. Sorry!)

Cheers,
lacos

Ben Bacarisse
Guest
Posts: n/a

 05-02-2010
Eric Sosman <(E-Mail Removed)> writes:

> On 5/2/2010 2:15 PM, Ben Bacarisse wrote:

<snip>
>> Do you know a good way to determine the width of an
>> unsigned type? By "good" I probably mean "other than the obvious"
>> iterative one.

>
> Hallvard B. Furuseth came up with
>
> "
>
> /* Number of bits in inttype_MAX, or in any (1<<k)-1 where
> * 0 <= k < 3.2E+10 */
> #define IMAX_BITS(m) ((m) /((m)%0x3fffffffL+1) /0x3fffffffL
> %0x3fffffffL *30 \
> + (m)%0x3fffffffL /((m)%31+1)/31%31*5 +
> 4-12/((m)%31+3))
>
> Or if you are less paranoid about how large UINTMAX_MAX can get:
>
> /* Number of bits in inttype_MAX, or in any (1<<k)-1 where 0 <= k < 2040 */
> #define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))
>
> ." (Sorry about the line-wrapping.) Dunno whether you'd deem this
> good, but it's certainly a jaw-dropper.

I remember seeing that now... And I was worried about suggesting

x ^ (x ^ y) & mask
for

!!

--
Ben.

Ben Bacarisse
Guest
Posts: n/a

 05-02-2010
Ben Bacarisse <(E-Mail Removed)> writes:

> Alex Vinokur <(E-Mail Removed)> writes:

<snip>
>> setbits(x,p,n,y) returns x with the n bits that begin at position p
>> set to the rightmost n bits of y, leaving the other bits unchanged.

^^^^^^^^^^^^^^^^^^^^^
I missed this bit of the spec. You need y << p - n + 1 in:

> unsigned width = CHAR_BIT * sizeof x;
> unsigned mask = ~0u >> width - n << p - n + 1;

return x & ~mask | (y << p - n + 1) & mask;

but you should also use:

unsigned mask = ~(~0u << n) << p - n + 1;

as this does not need the width of the type.

--
Ben.

Ben Bacarisse
Guest
Posts: n/a

 05-02-2010
Jonathan Lee <(E-Mail Removed)> writes:

> On May 2, 7:46Â*am, Alex Vinokur <(E-Mail Removed)> wrote:
>> Here is Richard Heathfield's function
>> ...
>> setbits(x,p,n,y) returns x with the n bits that begin at position p
>> set to the rightmost n bits of y, leaving the other bits unchanged.
>>
>> It seems that setbits (x, 31, n, y) may produce undefined behavior.

>
> Assuming p, n are in [0, INT_BIT)
>
> unsigned mask = (~((~0U) << n)) << p;

This is a better way to make the mask but I think you've altered what p
means. Alex (based on Richard's code) seems to take p to be the
position of the most significant bit of those changed.

It's simpler (and consistent with the problem wording) to interpret p as
the least significant bit of the changed bits (as you have done) but
some people might be confused by this change of meaning.

It easy to switch to the other interpretation of the problem: substitute
p - n + 1 for p.

> or
> return x ^ ((x ^ (y << p)) & mask)

--
Ben.