Velocity Reviews > Getting a range of bits from a bit vector

# Getting a range of bits from a bit vector

jacob navia
Guest
Posts: n/a

 03-17-2011
Hi

Suppose you have a bit vector composed of a descriptor and a sequence of
bits.
In descriptor->count you have the number of valid bits, and in
descriptor->contents you have a sequence of bits declared as
unsigned chars. The function "Create()" creates a new vector.

Does this function looks OK to you?

< begin code > ---------------------------------------------------
static BitString *GetRange(BitString *bs,size_t start,size_t end)
{
size_t t,len;
BitString *result;
size_t startbyte,endbyte,shiftamount,bytesToCopy,idx;

if (bs == NULL) {
NullPtrError("GetRange");

return NULL;
}
if (start >= bs->count)
return NULL;
if (start > end) {
t = start;
start = end;
end = t;
}
if (end >= bs->count)
end = bs->count;
len = end-start;
result = Create(len);
if (len == 0)
return result;
result->count = len;
startbyte = start/CHAR_BIT;
endbyte = end/CHAR_BIT;
shiftamount = start&(CHAR_BIT-1);
if (shiftamount == 0) {
/* Optimize this case. We can do just a memory move */
memmove(result->contents,bs->contents+startbyte,
endbyte - startbyte);
}
else {
/* Copy the first byte. Bring the first bit to be copied into
the position zero by shifting right all bits smaller than
the start index */
result->contents[0] = (bs->contents[startbyte] >> shiftamount);
bytesToCopy = (endbyte-startbyte);
idx = 1;
while (bytesToCopy) {
/* Put the low bits of the next byte into the correct
position of the last byte of the result */
unsigned b = (bs->contents[++startbyte] <<
(CHAR_BIT-shiftamount));
result->contents[idx-1] |= b;
/* Put the high bits now in the low position of the result */
b = (bs->contents[startbyte] >> (CHAR_BIT-shiftamount));
result->contents[idx] |= b;
bytesToCopy--;
idx++;
}
}
return result;
}
< end code > ---------------------------------------------------

Ike Naar
Guest
Posts: n/a

 03-17-2011
On 2011-03-17, jacob navia <(E-Mail Removed)> wrote:
> if (start >= bs->count)
> return NULL;
> if (start > end) {
> t = start;
> start = end;
> end = t;
> }

First, I do not like this kind of "fudging", and wonder if
the function would be cleaner if it just would require
``0 <= start <= end <= bs->count'' as a precondition,
returning NULL when the precondition is not satisfied.

Anyway, *if* you decide to swap start/end so that in the end
``start <= end'' holds,
then I think it's cleaner to put the ``start >= bs->count'' check
after the start/end swap, not before it.

Motivation: apparently ``start=10, end=5, bs->count=12'' is okay,
and equivalent to ``start=5, end=10, bs->count=12''.
Then it's a bit odd that ``start=10, end=5, bs->count=8'' would be
an error instead of being equivalent to the (acceptable)
``start=5, end=10, bs->count=8''.

I would also change the error check ``start >= bs-count''
to ``start > bs->count'' because I do not think it is an error to
extract the empty range from the empty bitvector, so the situation
start = end = bs->count = 0 should yield an empty bitvector, not
an error.

Juha Niskanen
Guest
Posts: n/a

 03-17-2011
On 2011-03-17, jacob navia <(E-Mail Removed)> wrote:
>
> if (start >= bs->count)
> return NULL;

I would use assert() here and reserve NULL return value to indicate
potentially recoverable condition like memory exhaustion.

> if (start > end) {
> t = start;
> start = end;
> end = t;
> }

Why not simply assert here instead of swapping endpoints? If caller can't
pass arguments in correct order, he might be doing something else wrong
as well. It would be best to detect potential programmer logic errors as
early as possible.

> result = Create(len);
> if (len == 0)
> return result;

I think you meant

if (result == NULL)
return result;

Or if intent was to return an empty bitvector when len==0, how does Create()
fail when len!=0 and it cannot allocate memory?

> result->count = len;

So the Create() did not fully construct a new bitvector after all? If it did,
then this assignment is redundant. If it did not, then you returned a partially
constructed empty bitvector earlier in len==0 case.

> shiftamount = start&(CHAR_BIT-1);

Do you need to support machines where CHAR_BIT is not a power of two?

--
Juha Niskanen

(For email, remove NOSPAM)

BartC
Guest
Posts: n/a

 03-17-2011
"jacob navia" <(E-Mail Removed)> wrote in message
news:ilsctm\$kqn\$(E-Mail Removed)...

> Suppose you have a bit vector composed of a descriptor and a sequence of
> bits.
> In descriptor->count you have the number of valid bits, and in
> descriptor->contents you have a sequence of bits declared as
> unsigned chars. The function "Create()" creates a new vector.
>
> Does this function looks OK to you?

> startbyte = start/CHAR_BIT;
> endbyte = end/CHAR_BIT;
> shiftamount = start&(CHAR_BIT-1);
> if (shiftamount == 0) {
> /* Optimize this case. We can do just a memory move */
> memmove(result->contents,bs->contents+startbyte,
> endbyte - startbyte);
> }
> else {
> /* Copy the first byte. Bring the first bit to be copied into
> the position zero by shifting right all bits smaller than
> the start index */

....

Have you thought of having a Bit Vector where the first bit can start
anywhere within the first byte rather than at bit 0 or bit 7?

It sounds untidy, but it means no shifting is needed when extracting a
subrange, while it is also possible to point into the subrange of an
existing bit vector, without extracting.

However, operations between two bit vectors might be more awkward if they
are not aligned the same way, and an offset of 0..7 needs to be maintained
and accounted for in every indexing op.

So it depends on what sort of operations are more frequent. Perhaps
alignment can be an option.

--
Bartc

jacob navia
Guest
Posts: n/a

 03-17-2011
Le 17/03/11 10:11, Ike Naar a écrit :
> On 2011-03-17, jacob navia<(E-Mail Removed)> wrote:
>> if (start>= bs->count)
>> return NULL;
>> if (start> end) {
>> t = start;
>> start = end;
>> end = t;
>> }

>
> First, I do not like this kind of "fudging", and wonder if
> the function would be cleaner if it just would require
> ``0<= start<= end<= bs->count'' as a precondition,
> returning NULL when the precondition is not satisfied.
>

Yes, I thought of that but all functions in the library should be
"forgiving" in the sense that specifying a range from 8 to 3
is not really an error, more a matter of conventions that

> Anyway, *if* you decide to swap start/end so that in the end
> ``start<= end'' holds,
> then I think it's cleaner to put the ``start>= bs->count'' check
> after the start/end swap, not before it.
>

I had it before the swap but then the problem arises that the
end could be larger than the length of the bitstring after the
swap, so I would need to do it again.

> Motivation: apparently ``start=10, end=5, bs->count=12'' is okay,
> and equivalent to ``start=5, end=10, bs->count=12''.
> Then it's a bit odd that ``start=10, end=5, bs->count=8'' would be
> an error instead of being equivalent to the (acceptable)
> ``start=5, end=10, bs->count=8''.
>
> I would also change the error check ``start>= bs-count''
> to ``start> bs->count'' because I do not think it is an error to
> extract the empty range from the empty bitvector, so the situation
> start = end = bs->count = 0 should yield an empty bitvector, not
> an error.

You are right in this last point.

jacob navia
Guest
Posts: n/a

 03-17-2011
Le 17/03/11 11:06, Juha Niskanen a écrit :
> On 2011-03-17, jacob navia<(E-Mail Removed)> wrote:
>>
>> if (start>= bs->count)
>> return NULL;

>
> I would use assert() here and reserve NULL return value to indicate
> potentially recoverable condition like memory exhaustion.

Yes, actually I will issue an error ("Bad arguments") with the standard
way the library answers to errors. But since the return value is fixed
I *have* to return NULL.
>
>> if (start> end) {
>> t = start;
>> start = end;
>> end = t;
>> }

>
> Why not simply assert here instead of swapping endpoints? If caller can't
> pass arguments in correct order, he might be doing something else wrong
> as well. It would be best to detect potential programmer logic errors as
> early as possible.

Well is it really an error to demand a range between 8 and 5?
Doesn't really look like an error to me.

>
>> result = Create(len);
>> if (len == 0)
>> return result;

>
> I think you meant
>
> if (result == NULL)
> return result;
>
> Or if intent was to return an empty bitvector when len==0, how does Create()
> fail when len!=0 and it cannot allocate memory?

No, there is a misunderrstanding here. Create returns a sring that can
hold up to len bits. But the string is initially empty, i.e. count is
always zero. There are two things to keep in mind:

1) The capacity of the string (argument to create())
2) The actual number of bits in the strings, at creation time zero.
>
>> result->count = len;

>
> So the Create() did not fully construct a new bitvector after all? If it did,
> then this assignment is redundant. If it did not, then you returned a partially
> constructed empty bitvector earlier in len==0 case.
>

See above
>> shiftamount = start&(CHAR_BIT-1);

>
> Do you need to support machines where CHAR_BIT is not a power of two?
>

Surely not, sorry. Imagine a byte size of 13... That would break SO MANY
programs that I would not need to worry about.

James Waldby
Guest
Posts: n/a

 03-17-2011
On Thu, 17 Mar 2011 08:23:32 +0100, jacob navia wrote:
....
> startbyte = start/CHAR_BIT;
> endbyte = end/CHAR_BIT;
> shiftamount = start&(CHAR_BIT-1);

....

Even if you plan to ignore the case where CHAR_BIT isn't 2^n, you
still could compute shiftamount = start - startbyte * CHAR_BIT.

--
jiw

Keith Thompson
Guest
Posts: n/a

 03-17-2011
jacob navia <(E-Mail Removed)> writes:
> Le 17/03/11 10:11, Ike Naar a Ã©crit :
>> On 2011-03-17, jacob navia<(E-Mail Removed)> wrote:
>>> if (start>= bs->count)
>>> return NULL;
>>> if (start> end) {
>>> t = start;
>>> start = end;
>>> end = t;
>>> }

>>
>> First, I do not like this kind of "fudging", and wonder if
>> the function would be cleaner if it just would require
>> ``0<= start<= end<= bs->count'' as a precondition,
>> returning NULL when the precondition is not satisfied.

>
> Yes, I thought of that but all functions in the library should be
> "forgiving" in the sense that specifying a range from 8 to 3
> is not really an error, more a matter of conventions that

Hmm. Personally, I'm not comfortable with that.

If I writes GetRange(bs, 8, 3), how likely is it that I really
want the range from bit 3 to bit 8? How much more likely is it
that I've made some kind of error (like assuming that the third
argument is a count), one that you've chosen not to diagnose?

IMHO, detecting and handling incorrect calls is nearly as important
as properly handling correct calls.

If you do want to swap the end points like this, I strongly suggest
documenting the behavior. And once people start depending on this,
you're stuck with the decision.

And a minor point: I'd make t local to the block. Actually,
I might write:

if (start > end) {
const size_t old_start = start;
start = end;
end = old_start;
}

(Not that there's anything wrong with the name "t".)

[...]

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Barry Schwarz
Guest
Posts: n/a

 03-17-2011
On Thu, 17 Mar 2011 08:23:32 +0100, jacob navia <(E-Mail Removed)>
wrote:

>Hi
>
>Suppose you have a bit vector composed of a descriptor and a sequence of
>bits.
>In descriptor->count you have the number of valid bits, and in
>descriptor->contents you have a sequence of bits declared as
>unsigned chars. The function "Create()" creates a new vector.
>
>Does this function looks OK to you?
>
>
>< begin code > ---------------------------------------------------
>static BitString *GetRange(BitString *bs,size_t start,size_t end)

Let bs->count == 24 and let bs->contents point to (or be an array of)
three 8-bit bytes holding the 24 bits. We can designate the 24 bits
as ABCDEFGH IJKLMNOP QRSTUVWX. If we call this function with
start set to 7 and end set to 21, I expect the result to be the 15-bit
sequence occupying two bytes HIJKLMNO PQRSTUV0.

I know that some architectures number the bits in a byte from right to
left but we are dealing with a artificial construct here (effectively
a bit array) and bit 7 should be "next to" bit 8, especially if we are
going to accept that not all bytes are 8-bits. On a 9-bit byte
system, the original data would look like ABCDEFGHI JKLMNOPQR
STUVWXzzz and the result should be HIJKLMNOP QRSTUV000.

The important point is, if you ignore the byte boundary, the sequence
of bits should be the same regardless of the value of CHAR_BIT.

>{
> size_t t,len;
> BitString *result;
> size_t startbyte,endbyte,shiftamount,bytesToCopy,idx;
>
> if (bs == NULL) {
> NullPtrError("GetRange");
>
> return NULL;
> }
> if (start >= bs->count)
> return NULL;
> if (start > end) {
> t = start;
> start = end;
> end = t;
> }
> if (end >= bs->count)
> end = bs->count;
> len = end-start;

You are missing a +1 here. The range 0 to 7 contains 8 bits, not 7.

> result = Create(len);
> if (len == 0)
> return result;

Does Create() initialize the members of the newly allocated
descriptor? If not, result->count is unspecified and when the calling
function attempts to evaluate it the result is undefined. If it does,
it should set result->count to len and eliminate the need for the next
line.

> result->count = len;
> startbyte = start/CHAR_BIT;
> endbyte = end/CHAR_BIT;
> shiftamount = start&(CHAR_BIT-1);

Change to
shiftamount = start % CHAR_BIT;
to handle the case where CHAR_BIT is not a power of 2.

> if (shiftamount == 0) {
> /* Optimize this case. We can do just a memory move */
> memmove(result->contents,bs->contents+startbyte,
> endbyte - startbyte);

You are missing a +1 here also.

> }
> else {
> /* Copy the first byte. Bring the first bit to be copied into
> the position zero by shifting right all bits smaller than
> the start index */
> result->contents[0] = (bs->contents[startbyte] >> shiftamount);

I don't understand this statement at all. It stores bits that should
be discarded. In any event, continuing with the example, contents[0]
contains 0000000A or 0000000AB for the 8-bit and 9-bit cases.

> bytesToCopy = (endbyte-startbyte);
> idx = 1;
> while (bytesToCopy) {
> /* Put the low bits of the next byte into the correct
> position of the last byte of the result */
> unsigned b = (bs->contents[++startbyte] <<
>(CHAR_BIT-shiftamount));

1 - In the first iteration, b contains JKLMNOP0 or JKLMNOP00.
2 - In the second iteration, b contains RSTUVWX0 or UVWXzzz00

> result->contents[idx-1] |= b;

1 - And contents[0] now contains JKLMNOPA or LMNOPQRAB.
2 - And contents[1] now contains RSTUVWXH or UVWXzzzJK.

> /* Put the high bits now in the low position of the result */
> b = (bs->contents[startbyte] >> (CHAR_BIT-shiftamount));

1 - And b now contains 0000000H or 0000000JK

> result->contents[idx] |= b;

1- And contents[1], if Create() initialized it to 0, contains the
same.

> bytesToCopy--;
> idx++;

2 - At the end of the second iteration contents now contains either
JKLMNOPA RSTUVWXH
or
LMNOPQRAB UVWXzzzJK
and I don't think either is what you intended.

> }
> }
> return result;
>}
>< end code > ---------------------------------------------------

--
Remove del for email

Keith Thompson
Guest
Posts: n/a

 03-17-2011
jacob navia <(E-Mail Removed)> writes:
> Le 17/03/11 11:06, Juha Niskanen a Ã©crit :
>> On 2011-03-17, jacob navia<(E-Mail Removed)> wrote:
>>> shiftamount = start&(CHAR_BIT-1);

>>
>> Do you need to support machines where CHAR_BIT is not a power of two?

>
> Surely not, sorry. Imagine a byte size of 13... That would break SO MANY
> programs that I would not need to worry about.

No argument there -- but you might consider enforcing this assumption
in your code, either with an assert or with a #error (if you can
figure out a way to determine at compile time whether CHAR_BIT is
a power of 2).

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"