Alan Schroeder wrote:
> The following code produces the expected results on a PC using gcc, but I
> need to port it (or least something similar) to a different
> platform/compiler. I don't think I've introduced any undefined behavior but
> would like another set of eyes to check.
I see only one probable blunder, but there are a few other
peculiarities, too. See below.
>
> #include <math.h>
> #include <stdlib.h>
>
> extern float window;
>
> /*
> * NOTE:
> * This function will return the lowest index into 'data' which contains
> * a value such that data[index]>=value-window, or -1 if a value can not
> * be found such that data[index]>=value-window and
> value+window<=data[index].
> */
"When the comments and the code disagree, both are
probably wrong." Maybe not in this case; I think it's
likely that the comment is just garbled and that the code
works as intended. Still, it's worth examining.
(My objection is that if window is non-negative, as
suggested by the way compare_window() is written, then
the condition value+window<=data[index] implies the condition
data[index]>=value-window. The search actually looks for a
value satisfying a pair of conditions that aren't redundant,
so there's something wrong here.)
> int
> find_first_index(const float *value, const float *data, size_t size, size_t
> width)
> {
> int index;
> float *fptr;
> float minValue;
>
> index = -1;
> fptr = bsearch(value, data, size, width, compare_window);
> if (fptr != NULL) {
> minValue = *value - window;
> index = (int)(fptr-data);
> while (index > 0 && data[index] > minValue) {
> index--;
> }
This looks weird. I can imagine three circumstances:
- width is always sizeof(float), and all the elements of
the data array are searched equally. If this is the case,
width should not be a function argument; what happens if
some caller supplies width=3?
- width is a multiple of sizeof(float), and bsearch looks
only at the first of each group of N=width/sizeof(float)
elements. (Maybe data really contains N-place vectors
strung end-to-end.) If so, then you should be stepping
index by N, not by 1. Also, it would be better to accept
the width argument as N rather than as N*sizeof(float),
and make the adjustment in the bsearch call -- again,
think about the caller providing width=42.
- width is N*sizeof(float) as above, but the intermediate
un-bsearched values are "bunched" within window of the
bsearched values. That's a peculiar enough arrangement
that it deserves a big fat comment.
> if (index) index++; /* We may have gone one too far */
Here's what I think is the blunder. If you bsearch for a
value that is exactly window greater than one of the array
elements, bsearch may find that array element. ("May," not
"will," because it could find a nearby element if they're
spaced less than window apart.) Then the while loop will not
iterate, and index will be as it was when calculated from the
bsearch result -- and then if index>0 you'll increment it.
This will violate the "lowest index" part of the function's
contract, and may also violate the range conditions (if the
original data[index]==value-window, then data[++index] might
equal value+1000*window.) If bsearch finds the topmost array
element, index is computed as size-1 and perhaps incremented
to size, which smells very much like an off-the-end error.
(If the "active" part of the data array is supposed to be
followed by one or more "inactive" values, that'a another
special situation deserving of a comment.)
> }
> return index;
> }
>
> /*
> * NOTE:
> * **ONLY** use this function to sort an array of floats; it won't work
> * as expected if searching for a arbitrary value.
> */
> int
> compare_exact(const void *e1, const void *e2)
> {
> const float *f1 = e1;
> const float *f2 = e2;
> float delta;
>
> delta = *f1 - *f2;
Possible overflow here if the array contains large positive
and large negative values. Possible loss of significance, which
I think is probably harmless in this setting (but which I haven't
studied thoroughly).
> if (delta < 0.0) return -1 ;
> if (delta > 0.0) return 1;
> return 0;
Consider avoiding all those worries about overflow and
significance loss by sticking to the comparison operators and
getting rid of the subtraction. You could write simply
if (*f1 < *f2) return -1;
if (*f1 > *f2) return +1;
return 0;
or you might opt for a one-liner only a C addict could love
return (*f1 > *f2) - (*f1 < *f2);
> }
>
> /*
> * This function can be use with 'bsearch' to find a float value within
> * a given +/- window
> */
> int
> compare_window(const void *e1, const void *e2)
> {
> const float *f1 = e1;
> const float *f2 = e2;
> float delta;
>
> delta = *f1 - *f2;
Possible overflow, as above.
> if (fabs(delta) <= window) return 0;
>
> if (delta < 0.0) return -1;
> if (delta > 0.0) return 1;
> /* NOTE: We should **NEVER** get here */
If you're so certain, call abort().
Better, whenever the arrangement of your tests produces
a branch that cannot be taken, it's usually a sign that there
is something amiss with the branching logic. In this case,
you've observed that one of the tests above must fire, which
means there's no reason to make the third test: the fact that
the first two didn't fire makes the result of the third a
foregone conclusion, right?
Consider rearranging the tests, perhaps along these lines:
if (delta < -window) return -1;
if (delta > +window) return +1;
return 0;
or the C addict's methadone
return (*f1 >= *f2-window) - (*f1 <= *f2+window);
> return 0;
> }
If I were writing this, I think I'd just forget about
bsearch and write my own binary search in open code. Yes,
they'd both be "binary search" -- but bsearch is a binary
search specialized for strict equality, which isn't what
you want. You're not guilty of using a hammer to drive
screws, but you may be using a tack-hammer when you ought
to use a ball-peen hammer.
--
Eric Sosman
lid