Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Degenerate strcmp

Reply
Thread Tools

Degenerate strcmp

 
 
fishpond@invalid.com
Guest
Posts: n/a
 
      08-17-2007
One way I've seen strcmp(char *s1, char *s2) implemented is: return
immediately if s1==s2 (equality of pointers); otherwise do the usual
thing of searching through the memory at s1 and s2.

Of course the reason for doing this is to save time in case equal
pointers are passed to strcmp. But it seems to me that this could create
an inconsistency in the degenerate case when s1 points to memory that is
not null-terminated, i.e. by some freak chance, all of the memory from
s1 till the computer reaches the end of all its memory pages (however
that works) don't contain a single null byte. In this case, strcmp
should not say that s1 and s2 are "equal strings" since neither is
actually a string (because not null terminated).

Is my thinking correct?

--
Q: "What is the burning question on the mind of every dyslexic
existentialist?"
A: "Is there a dog?"
 
Reply With Quote
 
 
 
 
pete
Guest
Posts: n/a
 
      08-17-2007
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:
>
> One way I've seen strcmp(char *s1, char *s2) implemented is: return
> immediately if s1==s2 (equality of pointers); otherwise do the usual
> thing of searching through the memory at s1 and s2.
>
> Of course the reason for doing this is to save time in case equal
> pointers are passed to strcmp.
> But it seems to me that this could create
> an inconsistency in the degenerate case when s1 points
> to memory that is
> not null-terminated, i.e. by some freak chance, all of the memory from
> s1 till the computer reaches the end of all its memory pages (however
> that works) don't contain a single null byte. In this case, strcmp
> should not say that s1 and s2 are "equal strings" since neither is
> actually a string (because not null terminated).
>
> Is my thinking correct?


No.
The behavior of strcmp is only defined for
cases when s1 and s2 both point to strings.
If it's not null terminated, it's not a string.

In cases where the behavior of the code is not defined,
the running program can do whatever it wants.
That's the rules of the C programming language.

--
pete
 
Reply With Quote
 
 
 
 
fishpond@invalid.com
Guest
Posts: n/a
 
      08-17-2007
On 17 Aug 2007 at 23:01, pete wrote:
> (E-Mail Removed) wrote:
>>
>> One way I've seen strcmp(char *s1, char *s2) implemented is: return
>> immediately if s1==s2 (equality of pointers); otherwise do the usual
>> thing of searching through the memory at s1 and s2.
>>
>> Of course the reason for doing this is to save time in case equal
>> pointers are passed to strcmp.
>> But it seems to me that this could create
>> an inconsistency in the degenerate case when s1 points
>> to memory that is
>> not null-terminated, i.e. by some freak chance, all of the memory from
>> s1 till the computer reaches the end of all its memory pages (however
>> that works) don't contain a single null byte. In this case, strcmp
>> should not say that s1 and s2 are "equal strings" since neither is
>> actually a string (because not null terminated).
>>
>> Is my thinking correct?

>
> No.
> The behavior of strcmp is only defined for
> cases when s1 and s2 both point to strings.
> If it's not null terminated, it's not a string.


Your right that a string has to be null terminated, but for random
memory maybe by chance there just isn't any null byte.

For example, for the program

main() { printf("%d\n",strlen(malloc(0))); }

this will print out a random number, but in principal the strlen call
might never terminate, if the memory at the pointer returned by malloc
doesn't have any null bytes. (OK, it's very unlikely, but it could
happen in theory...)

>
> In cases where the behavior of the code is not defined,
> the running program can do whatever it wants.
> That's the rules of the C programming language.
>
> --
> pete


--
Hlade's Law:
If you have a difficult task, give it to a lazy person --
they will find an easier way to do it.
 
Reply With Quote
 
pete
Guest
Posts: n/a
 
      08-17-2007
(E-Mail Removed) wrote:
>
> On 17 Aug 2007 at 23:01, pete wrote:


> but for random
> memory maybe by chance there just isn't any null byte.


> > In cases where the behavior of the code is not defined,
> > the running program can do whatever it wants.
> > That's the rules of the C programming language.


Do you understand what I've quoted of myself here?

--
pete
 
Reply With Quote
 
user923005
Guest
Posts: n/a
 
      08-17-2007
On Aug 17, 3:47 pm, (E-Mail Removed) wrote:
> One way I've seen strcmp(char *s1, char *s2) implemented is: return
> immediately if s1==s2 (equality of pointers); otherwise do the usual
> thing of searching through the memory at s1 and s2.


Adding an if() test for that is not (in general) a good idea.
A missed branch prediction is expensive.
How often are you really going to do this:
if (strcmp(p,p)==0) call_captain_obvious();
A library function with a quirk like that would make me worry about
the quality of the implementation.

> Of course the reason for doing this is to save time in case equal
> pointers are passed to strcmp. But it seems to me that this could create
> an inconsistency in the degenerate case when s1 points to memory that is
> not null-terminated, i.e. by some freak chance, all of the memory from
> s1 till the computer reaches the end of all its memory pages (however
> that works) don't contain a single null byte. In this case, strcmp
> should not say that s1 and s2 are "equal strings" since neither is
> actually a string (because not null terminated).
>
> Is my thinking correct?


It is undefined behavior in any case to call strcmp() with addresses
that are not null terminated strings.

> --
> Q: "What is the burning question on the mind of every dyslexic
> existentialist?"
> A: "Is there a dog?"


The actual joke goes:
Q: What does an agnostic, insomniac, dyslexic person do?
A: He lays awake at night, wondering if there is a dog.

 
Reply With Quote
 
pete
Guest
Posts: n/a
 
      08-18-2007
user923005 wrote:
>
> On Aug 17, 3:47 pm, (E-Mail Removed) wrote:
> > One way I've seen strcmp(char *s1, char *s2) implemented is: return
> > immediately if s1==s2 (equality of pointers); otherwise do the usual
> > thing of searching through the memory at s1 and s2.

>
> Adding an if() test for that is not (in general) a good idea.
> A missed branch prediction is expensive.
> How often are you really going to do this:
> if (strcmp(p,p)==0) call_captain_obvious();
> A library function with a quirk like that would make me worry about
> the quality of the implementation.


Anybody who writes code to compare string p with string p,
isn't in a rush.
And that's one of the reasons why I think that it's usually bad
to optimize the degenerate special case
at any cost at all to the general case.

--
pete
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      08-18-2007
(E-Mail Removed) wrote:
> One way I've seen strcmp(char *s1, char *s2) implemented is: return
> immediately if s1==s2 (equality of pointers); otherwise do the usual
> thing of searching through the memory at s1 and s2.
>
> Of course the reason for doing this is to save time in case equal
> pointers are passed to strcmp. But it seems to me that this could create
> an inconsistency in the degenerate case when s1 points to memory that is
> not null-terminated, i.e. by some freak chance, all of the memory from
> s1 till the computer reaches the end of all its memory pages (however
> that works) don't contain a single null byte. In this case, strcmp
> should not say that s1 and s2 are "equal strings" since neither is
> actually a string (because not null terminated).
>
> Is my thinking correct?


What you seem to have missed is that there is no "correct"
behavior in the case you describe: The behavior is undefined
because the arguments are not strings. Returning zero is one
possible behavior, a SIGSEGV is another, a graphic of a nasal
demon whistling "Dixie" while riding backwards on a bicycle
is yet another.

--
Eric Sosman
(E-Mail Removed)lid
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      08-18-2007
user923005 wrote:
> On Aug 17, 3:47 pm, (E-Mail Removed) wrote:
>> One way I've seen strcmp(char *s1, char *s2) implemented is: return
>> immediately if s1==s2 (equality of pointers); otherwise do the usual
>> thing of searching through the memory at s1 and s2.

>
> Adding an if() test for that is not (in general) a good idea.
> A missed branch prediction is expensive.
> How often are you really going to do this:
> if (strcmp(p,p)==0) call_captain_obvious();
> A library function with a quirk like that would make me worry about
> the quality of the implementation.


I tried to use strcmp(p, p) as a deliberate time-waster
once. I wanted to study the behavior of a sorting function
with "fast" and "slow" user-supplied comparators: the "slow"
one was strcmp(p, p) followed by a call to the "fast" one.
My program adjusted the length of the string at p until I got
a ten-to-one speed ratio.

This worked fine on several systems, but alas! I ran
across one where my setup code kept making p longer and longer
without slowing anything down -- and it turned out that strcmp
was returning zero immediately, as described.

BUT: Was this a stupid test? I don't think so. The
strcmp implementation made a bunch of (highly non-portable)
tests to decide whether it could replace a byte-by-byte loop
with a loop that took bigger, er, bites: two, four, or even
eight at a time. The decision was based on the alignments
of the two operands -- and the "both operands equal" case just
fell out of the alignment tests.

Eventually, I arranged for p to consist entirely of 'X'
and called strcmp(p, p+1), proceeding to call the "fast" method
if ("if") strcmp didn't return zero. Works like a charm.

--
Eric Sosman
(E-Mail Removed)lid
 
Reply With Quote
 
CBFalconer
Guest
Posts: n/a
 
      08-18-2007
user923005 wrote:
>

.... snip ...
>
> It is undefined behavior in any case to call strcmp() with addresses
> that are not null terminated strings.


That is a bar to relying on any particular behaviour in such a
circumstance, not a bar to defining the result. Thus the following
should be a satisfactory system implementation of strcmp():

int strcmp(const char *_s1, const char *_s2) {
if (_s1 == _s2) return 0;
else
while (*_s1 && (*_s1 == *_s2)) {
_s1++; _s2++;
}
return (*_s1 > *_s2) - (*_s1 < *_s2);
} /* strcmp */

--
Chuck F (cbfalconer at maineline dot net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net>



--
Posted via a free Usenet account from http://www.teranews.com

 
Reply With Quote
 
Richard Heathfield
Guest
Posts: n/a
 
      08-18-2007
user923005 said:

> On Aug 17, 3:47 pm, (E-Mail Removed) wrote:
>
>> --
>> Q: "What is the burning question on the mind of every dyslexic
>> existentialist?"
>> A: "Is there a dog?"

>
> The actual joke goes:
> Q: What does an agnostic, insomniac, dyslexic person do?
> A: He lays awake at night, wondering if there is a dog.


Can we lay off the dyslexia jokes, please? They're not clever, and
they're not furry.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
segfault when strcmp Robert Mens C Programming 6 10-23-2003 02:53 AM
strcmp muser C++ 6 10-09-2003 08:18 AM
strcmp problem Shane Peck C++ 6 09-22-2003 05:44 PM
strcmp but with '\n' as the terrminator Allan Bruce C Programming 53 07-30-2003 07:38 PM
please help with strcmp() Andrej Hocevar C Programming 3 07-19-2003 04:41 PM



Advertisments