Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   Degenerate strcmp (http://www.velocityreviews.com/forums/t530049-degenerate-strcmp.html)

fishpond@invalid.com 08-17-2007 10:47 PM

Degenerate strcmp
 
One way I've seen strcmp(char *s1, char *s2) implemented is: return
immediately if s1==s2 (equality of pointers); otherwise do the usual
thing of searching through the memory at s1 and s2.

Of course the reason for doing this is to save time in case equal
pointers are passed to strcmp. But it seems to me that this could create
an inconsistency in the degenerate case when s1 points to memory that is
not null-terminated, i.e. by some freak chance, all of the memory from
s1 till the computer reaches the end of all its memory pages (however
that works) don't contain a single null byte. In this case, strcmp
should not say that s1 and s2 are "equal strings" since neither is
actually a string (because not null terminated).

Is my thinking correct?

--
Q: "What is the burning question on the mind of every dyslexic
existentialist?"
A: "Is there a dog?"

pete 08-17-2007 11:01 PM

Re: Degenerate strcmp
 
fishpond@invalid.com wrote:
>
> One way I've seen strcmp(char *s1, char *s2) implemented is: return
> immediately if s1==s2 (equality of pointers); otherwise do the usual
> thing of searching through the memory at s1 and s2.
>
> Of course the reason for doing this is to save time in case equal
> pointers are passed to strcmp.
> But it seems to me that this could create
> an inconsistency in the degenerate case when s1 points
> to memory that is
> not null-terminated, i.e. by some freak chance, all of the memory from
> s1 till the computer reaches the end of all its memory pages (however
> that works) don't contain a single null byte. In this case, strcmp
> should not say that s1 and s2 are "equal strings" since neither is
> actually a string (because not null terminated).
>
> Is my thinking correct?


No.
The behavior of strcmp is only defined for
cases when s1 and s2 both point to strings.
If it's not null terminated, it's not a string.

In cases where the behavior of the code is not defined,
the running program can do whatever it wants.
That's the rules of the C programming language.

--
pete

fishpond@invalid.com 08-17-2007 11:25 PM

Re: Degenerate strcmp
 
On 17 Aug 2007 at 23:01, pete wrote:
> fishpond@invalid.com wrote:
>>
>> One way I've seen strcmp(char *s1, char *s2) implemented is: return
>> immediately if s1==s2 (equality of pointers); otherwise do the usual
>> thing of searching through the memory at s1 and s2.
>>
>> Of course the reason for doing this is to save time in case equal
>> pointers are passed to strcmp.
>> But it seems to me that this could create
>> an inconsistency in the degenerate case when s1 points
>> to memory that is
>> not null-terminated, i.e. by some freak chance, all of the memory from
>> s1 till the computer reaches the end of all its memory pages (however
>> that works) don't contain a single null byte. In this case, strcmp
>> should not say that s1 and s2 are "equal strings" since neither is
>> actually a string (because not null terminated).
>>
>> Is my thinking correct?

>
> No.
> The behavior of strcmp is only defined for
> cases when s1 and s2 both point to strings.
> If it's not null terminated, it's not a string.


Your right that a string has to be null terminated, but for random
memory maybe by chance there just isn't any null byte.

For example, for the program

main() { printf("%d\n",strlen(malloc(0))); }

this will print out a random number, but in principal the strlen call
might never terminate, if the memory at the pointer returned by malloc
doesn't have any null bytes. (OK, it's very unlikely, but it could
happen in theory...)

>
> In cases where the behavior of the code is not defined,
> the running program can do whatever it wants.
> That's the rules of the C programming language.
>
> --
> pete


--
Hlade's Law:
If you have a difficult task, give it to a lazy person --
they will find an easier way to do it.

pete 08-17-2007 11:37 PM

Re: Degenerate strcmp
 
fishpond@invalid.com wrote:
>
> On 17 Aug 2007 at 23:01, pete wrote:


> but for random
> memory maybe by chance there just isn't any null byte.


> > In cases where the behavior of the code is not defined,
> > the running program can do whatever it wants.
> > That's the rules of the C programming language.


Do you understand what I've quoted of myself here?

--
pete

user923005 08-17-2007 11:48 PM

Re: Degenerate strcmp
 
On Aug 17, 3:47 pm, fishp...@invalid.com wrote:
> One way I've seen strcmp(char *s1, char *s2) implemented is: return
> immediately if s1==s2 (equality of pointers); otherwise do the usual
> thing of searching through the memory at s1 and s2.


Adding an if() test for that is not (in general) a good idea.
A missed branch prediction is expensive.
How often are you really going to do this:
if (strcmp(p,p)==0) call_captain_obvious();
A library function with a quirk like that would make me worry about
the quality of the implementation.

> Of course the reason for doing this is to save time in case equal
> pointers are passed to strcmp. But it seems to me that this could create
> an inconsistency in the degenerate case when s1 points to memory that is
> not null-terminated, i.e. by some freak chance, all of the memory from
> s1 till the computer reaches the end of all its memory pages (however
> that works) don't contain a single null byte. In this case, strcmp
> should not say that s1 and s2 are "equal strings" since neither is
> actually a string (because not null terminated).
>
> Is my thinking correct?


It is undefined behavior in any case to call strcmp() with addresses
that are not null terminated strings.

> --
> Q: "What is the burning question on the mind of every dyslexic
> existentialist?"
> A: "Is there a dog?"


The actual joke goes:
Q: What does an agnostic, insomniac, dyslexic person do?
A: He lays awake at night, wondering if there is a dog.


pete 08-18-2007 12:01 AM

Re: Degenerate strcmp
 
user923005 wrote:
>
> On Aug 17, 3:47 pm, fishp...@invalid.com wrote:
> > One way I've seen strcmp(char *s1, char *s2) implemented is: return
> > immediately if s1==s2 (equality of pointers); otherwise do the usual
> > thing of searching through the memory at s1 and s2.

>
> Adding an if() test for that is not (in general) a good idea.
> A missed branch prediction is expensive.
> How often are you really going to do this:
> if (strcmp(p,p)==0) call_captain_obvious();
> A library function with a quirk like that would make me worry about
> the quality of the implementation.


Anybody who writes code to compare string p with string p,
isn't in a rush.
And that's one of the reasons why I think that it's usually bad
to optimize the degenerate special case
at any cost at all to the general case.

--
pete

Eric Sosman 08-18-2007 01:40 AM

Re: Degenerate strcmp
 
fishpond@invalid.com wrote:
> One way I've seen strcmp(char *s1, char *s2) implemented is: return
> immediately if s1==s2 (equality of pointers); otherwise do the usual
> thing of searching through the memory at s1 and s2.
>
> Of course the reason for doing this is to save time in case equal
> pointers are passed to strcmp. But it seems to me that this could create
> an inconsistency in the degenerate case when s1 points to memory that is
> not null-terminated, i.e. by some freak chance, all of the memory from
> s1 till the computer reaches the end of all its memory pages (however
> that works) don't contain a single null byte. In this case, strcmp
> should not say that s1 and s2 are "equal strings" since neither is
> actually a string (because not null terminated).
>
> Is my thinking correct?


What you seem to have missed is that there is no "correct"
behavior in the case you describe: The behavior is undefined
because the arguments are not strings. Returning zero is one
possible behavior, a SIGSEGV is another, a graphic of a nasal
demon whistling "Dixie" while riding backwards on a bicycle
is yet another.

--
Eric Sosman
esosman@ieee-dot-org.invalid

Eric Sosman 08-18-2007 01:56 AM

Re: Degenerate strcmp
 
user923005 wrote:
> On Aug 17, 3:47 pm, fishp...@invalid.com wrote:
>> One way I've seen strcmp(char *s1, char *s2) implemented is: return
>> immediately if s1==s2 (equality of pointers); otherwise do the usual
>> thing of searching through the memory at s1 and s2.

>
> Adding an if() test for that is not (in general) a good idea.
> A missed branch prediction is expensive.
> How often are you really going to do this:
> if (strcmp(p,p)==0) call_captain_obvious();
> A library function with a quirk like that would make me worry about
> the quality of the implementation.


I tried to use strcmp(p, p) as a deliberate time-waster
once. I wanted to study the behavior of a sorting function
with "fast" and "slow" user-supplied comparators: the "slow"
one was strcmp(p, p) followed by a call to the "fast" one.
My program adjusted the length of the string at p until I got
a ten-to-one speed ratio.

This worked fine on several systems, but alas! I ran
across one where my setup code kept making p longer and longer
without slowing anything down -- and it turned out that strcmp
was returning zero immediately, as described.

BUT: Was this a stupid test? I don't think so. The
strcmp implementation made a bunch of (highly non-portable)
tests to decide whether it could replace a byte-by-byte loop
with a loop that took bigger, er, bites: two, four, or even
eight at a time. The decision was based on the alignments
of the two operands -- and the "both operands equal" case just
fell out of the alignment tests.

Eventually, I arranged for p to consist entirely of 'X'
and called strcmp(p, p+1), proceeding to call the "fast" method
if ("if") strcmp didn't return zero. Works like a charm.

--
Eric Sosman
esosman@ieee-dot-org.invalid

CBFalconer 08-18-2007 06:17 AM

Re: Degenerate strcmp
 
user923005 wrote:
>

.... snip ...
>
> It is undefined behavior in any case to call strcmp() with addresses
> that are not null terminated strings.


That is a bar to relying on any particular behaviour in such a
circumstance, not a bar to defining the result. Thus the following
should be a satisfactory system implementation of strcmp():

int strcmp(const char *_s1, const char *_s2) {
if (_s1 == _s2) return 0;
else
while (*_s1 && (*_s1 == *_s2)) {
_s1++; _s2++;
}
return (*_s1 > *_s2) - (*_s1 < *_s2);
} /* strcmp */

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>



--
Posted via a free Usenet account from http://www.teranews.com


Richard Heathfield 08-18-2007 06:27 AM

Re: Degenerate strcmp
 
user923005 said:

> On Aug 17, 3:47 pm, fishp...@invalid.com wrote:
>
>> --
>> Q: "What is the burning question on the mind of every dyslexic
>> existentialist?"
>> A: "Is there a dog?"

>
> The actual joke goes:
> Q: What does an agnostic, insomniac, dyslexic person do?
> A: He lays awake at night, wondering if there is a dog.


Can we lay off the dyslexia jokes, please? They're not clever, and
they're not furry.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999


All times are GMT. The time now is 12:39 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.