Velocity Reviews > Re: Finding out if a struct is in an array of structs

# Re: Finding out if a struct is in an array of structs

Keith Thompson
Guest
Posts: n/a

 08-11-2009
http://www.velocityreviews.com/forums/(E-Mail Removed) (Richard Harter) writes:
> I would like to know the "best" way to find out if a struct is in
> an array of structs given the address of the struct and the
> address of the array of structs. I want to do this cleanly,
> i.e., with no potential undefined behaviour.
>
> The obvious way is to subtract the address of the struct from the
> address of the array and see if the difference is in range;
> however this is a no-no.
>
> Is there any clean way of doing this other than looping through
> the array and comparing structs field by field?

Not without avoiding undefined behavior.

But if you're willing to assume that the undefined behavior of pointer
comparison doesn't do anything worse than give you an incorrect
result, you can do something like this:

Use "<=" to determine whether the struct's address is within the
range of addresses of the array's element.

If it isn't, you're done; the struct isn't in the array.

If it is, use "-" to compute the index i, and check whether the

The last step is unnecessary if you can assume an ordinary linear
addressing scheme, but could be necessary if, for example, each
pointer is a base+offset pair and "<=" compares only the offsets
(because the base will be the same within an object).

The tricky part is finding a system with a non-linear addressing
scheme so you can test the code.

Of course it will fail on the DS9K.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Phil Carmody
Guest
Posts: n/a

 08-11-2009
Keith Thompson <(E-Mail Removed)> writes:
> (E-Mail Removed) (Richard Harter) writes:
>> I would like to know the "best" way to find out if a struct is in
>> an array of structs given the address of the struct and the
>> address of the array of structs. I want to do this cleanly,
>> i.e., with no potential undefined behaviour.
>>
>> The obvious way is to subtract the address of the struct from the
>> address of the array and see if the difference is in range;
>> however this is a no-no.
>>
>> Is there any clean way of doing this other than looping through
>> the array and comparing structs field by field?

>
> Not without avoiding undefined behavior.
>
> But if you're willing to assume that the undefined behavior of pointer
> comparison doesn't do anything worse than give you an incorrect
> result, you can do something like this:
>
> Use "<=" to determine whether the struct's address is within the
> range of addresses of the array's element.
>
> If it isn't, you're done; the struct isn't in the array.
>
> If it is, use "-" to compute the index i, and check whether the
>
> The last step is unnecessary if you can assume an ordinary linear
> addressing scheme, but could be necessary if, for example, each
> pointer is a base+offset pair and "<=" compares only the offsets
> (because the base will be the same within an object).
>
> The tricky part is finding a system with a non-linear addressing
> scheme so you can test the code.
>
> Of course it will fail on the DS9K.

I should fling this question to c.s.c, but I sometimes wonder
why issues like this weren't made implementation defined rather
than undefined?

Phil
--
If GML was an infant, SGML is the bright youngster far exceeds
expectations and made its parents too proud, but XML is the
before he had sex, which was rape. -- Erik Naggum (1965-2009)

Keith Thompson
Guest
Posts: n/a

 08-12-2009
Kenneth Brody <(E-Mail Removed)> writes:
> Phil Carmody wrote:
>> Keith Thompson <(E-Mail Removed)> writes:
>>> (E-Mail Removed) (Richard Harter) writes:

> [... comparing addresses in different objects ...]
>>> The last step is unnecessary if you can assume an ordinary linear
>>> addressing scheme, but could be necessary if, for example, each
>>> pointer is a base+offset pair and "<=" compares only the offsets
>>> (because the base will be the same within an object).
>>>
>>> The tricky part is finding a system with a non-linear addressing
>>> scheme so you can test the code.

>
> BTDT. Real-mode x86 CPU, using "large model" addressing.
>
>>> Of course it will fail on the DS9K.

>>
>> I should fling this question to c.s.c, but I sometimes wonder
>> why issues like this weren't made implementation defined rather
>> than undefined?

>
> Can an implementation define implementation-defined behavior as "it

Not in general, no. For each instance of unspecified or
implementation-defined behavior, the standard specifies a set
of possible behaviors, and the implementation gets to choose
from that set. (The difference between the two is that if it's
implementation-defined, the implementation has to document its
choice; I-D behavior is actually a subset of unspecified behavior.)

In the case of relational operators on pointers, the standard says
that the comparison (p0 <= p1) invokes undefined behavior if p0 and
p1 don't point into the same object (stated imprecisely for brevity).
It *could* have said that the result is unspecified, but must be
either 0 or 1. But how much should, or could, the standard specify?
Must the result be consistent from one comparison to the next?

And for the problem at hand, my proposed not-quite-portable solution
invokes UB both for the comparison and for pointer arithmetic.

The problem, paraphrased is:

Given a pointer ptr to a struct foo and an array arr of struct
foo, does ptr point to an element of arr?

The portable solution is to use "==" to compare ptr to each &arr[i],
but that's inefficient (O(N)).

My non-portable solution is (untested):

if (ptr >= &arr[0] && ptr < &arr[ARR_MAX]) {
/* ptr *might* point to an element of arr */
ptrdiff_t i = ptr - &arr[0];
if (ptr == &arr[i]) {
/* ptr definitely points to an element of arr */
}
else {
/* ptr doesn't point to an element of arr;
">=" and "<" gave us a false positive */
}
}
else {
/* ptr doesn't point to an element of arr */
}

But for this to work, the behavior of ``ptr - &arr[0]'' has to be
sufficiently benign. If the ">=" and "<" produce a false positive
result, then ``ptr - &arr[0]'' yields a pointer to some invalid
location, with (in principle) arbitrarily bad results either from
computing it or from using it.

Probably the authors of the standard gave this some thought
and decided that covering all the possible corner cases for
implementation-defined behavior was too difficult, given the wide
potential variety of addressing schemes (linear and x86-style
segmented aren't the only possibilities) and the relatively small
benefit.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Eric Sosman
Guest
Posts: n/a

 08-12-2009
Keith Thompson wrote:
> [...] For each instance of unspecified or
> implementation-defined behavior, the standard specifies a set
> of possible behaviors, and the implementation gets to choose
> from that set. (The difference between the two is that if it's
> implementation-defined, the implementation has to document its
> choice; I-D behavior is actually a subset of unspecified behavior.)

Are you sure of the "choose from that set" part? For
example, when attempting to convert an out-of-range value to
a signed integer type "either the result is implementation-
defined or an implementation-defined signal is raised." The
I-D result could be viewed as a choice from the set of all
possible values of the target type, but where does the Standard
enumerate the I-D signals that might be raised?

(I reject the use of "choice from a set" when the set is
"everything you can imagine, plus all the things you can't,
not enumerated here." That way lies Russel's Paradox, not a
fit topic for a Standard.)

> The problem, paraphrased is:
>
> Given a pointer ptr to a struct foo and an array arr of struct
> foo, does ptr point to an element of arr?
>
> The portable solution is to use "==" to compare ptr to each &arr[i],
> but that's inefficient (O(N)).

It's O(1) with a good choice of the first `i'.

(There's actually a germ of truth behind the smart-ass
joke: It's what makes hash tables fast. Unfortunately, I
can't think of a 100% foolproof way to apply hashing to the
problem. You could develop a hash code from the bytes of
`ptr', but then what? The Standard does not require that
equal pointer values have identical representations, so two
pointers that compare equal could have different hashes.)

--
Eric Sosman
(E-Mail Removed)lid

Keith Thompson
Guest
Posts: n/a

 08-12-2009
Eric Sosman <(E-Mail Removed)> writes:
> Keith Thompson wrote:
>> [...] For each instance of unspecified or
>> implementation-defined behavior, the standard specifies a set
>> of possible behaviors, and the implementation gets to choose
>> from that set. (The difference between the two is that if it's
>> implementation-defined, the implementation has to document its
>> choice; I-D behavior is actually a subset of unspecified behavior.)

>
> Are you sure of the "choose from that set" part? For
> example, when attempting to convert an out-of-range value to
> a signed integer type "either the result is implementation-
> defined or an implementation-defined signal is raised." The
> I-D result could be viewed as a choice from the set of all
> possible values of the target type, but where does the Standard
> enumerate the I-D signals that might be raised?
>
> (I reject the use of "choice from a set" when the set is
> "everything you can imagine, plus all the things you can't,
> not enumerated here." That way lies Russel's Paradox, not a
> fit topic for a Standard.)

The set of signals is finite, and each signal is represented by an int
value (see the signal and raise functions in <signal.h>).

And for a given implementation, the set of signals is likely to be
much smaller than that.

[...]

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

jameskuyper
Guest
Posts: n/a

 08-12-2009

Eric Sosman wrote:
> Keith Thompson wrote:
> > [...] For each instance of unspecified or
> > implementation-defined behavior, the standard specifies a set
> > of possible behaviors, and the implementation gets to choose
> > from that set. (The difference between the two is that if it's
> > implementation-defined, the implementation has to document its
> > choice; I-D behavior is actually a subset of unspecified behavior.)

>
> Are you sure of the "choose from that set" part? For
> example, when attempting to convert an out-of-range value to
> a signed integer type "either the result is implementation-
> defined or an implementation-defined signal is raised." The
> I-D result could be viewed as a choice from the set of all
> possible values of the target type, but where does the Standard
> enumerate the I-D signals that might be raised?

It specifies that signal are identified by numbers that are ints and
must be positive so the set is those signals identified by numbers in
the range [1, INT_MAX].

Eric Sosman
Guest
Posts: n/a

 08-12-2009
jameskuyper wrote:
>
> Eric Sosman wrote:
>> Keith Thompson wrote:
>>> [...] For each instance of unspecified or
>>> implementation-defined behavior, the standard specifies a set
>>> of possible behaviors, and the implementation gets to choose
>>> from that set. (The difference between the two is that if it's
>>> implementation-defined, the implementation has to document its
>>> choice; I-D behavior is actually a subset of unspecified behavior.)

>> Are you sure of the "choose from that set" part? For
>> example, when attempting to convert an out-of-range value to
>> a signed integer type "either the result is implementation-
>> defined or an implementation-defined signal is raised." The
>> I-D result could be viewed as a choice from the set of all
>> possible values of the target type, but where does the Standard
>> enumerate the I-D signals that might be raised?

>
> It specifies that signal are identified by numbers that are ints and
> must be positive so the set is those signals identified by numbers in
> the range [1, INT_MAX].

... and INT_MAX is in the range [32767, um...]

--
Eric Sosman
(E-Mail Removed)lid

jameskuyper
Guest
Posts: n/a

 08-12-2009
Eric Sosman wrote:
> jameskuyper wrote:
> >
> > Eric Sosman wrote:
> >> Keith Thompson wrote:
> >>> [...] For each instance of unspecified or
> >>> implementation-defined behavior, the standard specifies a set
> >>> of possible behaviors, and the implementation gets to choose
> >>> from that set. (The difference between the two is that if it's
> >>> implementation-defined, the implementation has to document its
> >>> choice; I-D behavior is actually a subset of unspecified behavior.)
> >> Are you sure of the "choose from that set" part? For
> >> example, when attempting to convert an out-of-range value to
> >> a signed integer type "either the result is implementation-
> >> defined or an implementation-defined signal is raised." The
> >> I-D result could be viewed as a choice from the set of all
> >> possible values of the target type, but where does the Standard
> >> enumerate the I-D signals that might be raised?

> >
> > It specifies that signal are identified by numbers that are ints and
> > must be positive so the set is those signals identified by numbers in
> > the range [1, INT_MAX].

>
> ... and INT_MAX is in the range [32767, um...]

You see that as a problem? It's no more of a problem than the fact
that the value of INT_MAX itself is implementation-defined, and has no
upper limit. The set of options permitted when an implementation
feature is unspecified need not be finite, or even countable, and
choices with respect to different unspecified features of an
implementation might interact with each other. None of that matters,
so long as it is possible to determine whether a given choice is in
the permitted set.

Another way to look at it is that the set of permitted values for both
INT_MAX and the implementation-defined signal is the positive
integers, with the constraints that 65535 <= INT_MAX and the signal
number <= INT_MAX.

Richard Bos
Guest
Posts: n/a

 08-15-2009
Eric Sosman <(E-Mail Removed)> wrote:

> jameskuyper wrote:
> >
> > Eric Sosman wrote:

> >> Are you sure of the "choose from that set" part? For
> >> example, when attempting to convert an out-of-range value to
> >> a signed integer type "either the result is implementation-
> >> defined or an implementation-defined signal is raised." The
> >> I-D result could be viewed as a choice from the set of all
> >> possible values of the target type, but where does the Standard
> >> enumerate the I-D signals that might be raised?

> >
> > It specifies that signal are identified by numbers that are ints and
> > must be positive so the set is those signals identified by numbers in
> > the range [1, INT_MAX].

>
> ... and INT_MAX is in the range [32767, um...]

Where um is large, but for each possible implementation, some finite
number. It's an algorithmically specified set, not an enumerated one;
but it is a set.

Richard

Tim Rentsch
Guest
Posts: n/a

 09-29-2009
Keith Thompson <(E-Mail Removed)> writes:

> Kenneth Brody <(E-Mail Removed)> writes:
>>
>> [Is an implementation allowed complete freedom of choice
>> for implementation-defined behavior.]

>
> Not in general, no. For each instance of unspecified or
> implementation-defined behavior, the standard specifies a set
> of possible behaviors, and the implementation gets to choose
> from that set.

That's right in principle, but the principle isn't always
volatile-qualified object -- it's implementation-defined,
but no set is identified.