Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Standards & Library functions

Reply
Thread Tools

Standards & Library functions

 
 
Richard G. Riley
Guest
Posts: n/a
 
      03-19-2006
In another thread it was pointed out that I'd made a booboo with
strcpy : one that that I've, if I'm honest, made many times
before. Not out of badness, just because since I first programmed C
back in 1986 (and have done so for about 25 % of the time since then)
or so I never really looked at the manpage for strcpy :
this combined with K&Rs famous pointer lessons which lead to 2 or 3
versions of a linear start to finish strpy implementatin meant that on
some occasions I used strcpy to move blocks of memory which may
overlap in a character buffer. Bad. Sloppy. This combined with swapping
between languages probably made me a little careless.

Anyway, my question is this:

Why would any language comittee decide to make strcpy work in an
undefined manner for an overlapping object? It seems to me to be as
valid to do the same for memmove.

In the platform/compiler implementation for strcpy you can do
something very quick/CPU specific from start to finish (which doesnt
mind about overlap) or not as you please. If there is overlap and the
instructions being used would corrupt the operation then, and only
then, branch off to a more robust copy using a call to memmove or
something similar. The use of memmove invariably results in a possibly
unnecessary strlen so why not just wrap it all in the strcpy function?

Any comments appreciated
 
Reply With Quote
 
 
 
 
Robert Gamble
Guest
Posts: n/a
 
      03-19-2006
Richard G. Riley wrote:
> In another thread it was pointed out that I'd made a booboo with
> strcpy : one that that I've, if I'm honest, made many times
> before. Not out of badness, just because since I first programmed C
> back in 1986 (and have done so for about 25 % of the time since then)
> or so I never really looked at the manpage for strcpy :
> this combined with K&Rs famous pointer lessons which lead to 2 or 3
> versions of a linear start to finish strpy implementatin meant that on
> some occasions I used strcpy to move blocks of memory which may
> overlap in a character buffer. Bad. Sloppy. This combined with swapping
> between languages probably made me a little careless.
>
> Anyway, my question is this:
>
> Why would any language comittee decide to make strcpy work in an
> undefined manner for an overlapping object? It seems to me to be as
> valid to do the same for memmove.


Because the vast majority of the time you are not copying overlapping
objects and it can be a lot more efficient to assume the objects don't
overlap. If the objects might overlap you can always use memmove.

> In the platform/compiler implementation for strcpy you can do
> something very quick/CPU specific from start to finish (which doesnt
> mind about overlap) or not as you please.


But you can usually perform the operation quicker if you don't have to
worry about the possibility of overlap.

> If there is overlap and the
> instructions being used would corrupt the operation then, and only
> then, branch off to a more robust copy using a call to memmove or
> something similar.


Are you suggesting that strcpy try to determine of the objects overlap
and behave accordingly? Why do you think the strcpy function should be
making this decision over the programmer and how exactly would strcpy
determine if the objects do overlap?

> The use of memmove invariably results in a possibly
> unnecessary strlen so why not just wrap it all in the strcpy function?


Why would memmove call strlen? Memmove does not operate on strings, it
operates on a specified number of bytes. The difference between
memmove and memcpy (which has the same overlapping restriction as
strcpy) is that the former operates as if it had first copied the
source into a new object avoiding the issue of overlap.

Robert Gamble

 
Reply With Quote
 
 
 
 
Jordan Abel
Guest
Posts: n/a
 
      03-19-2006
On 2006-03-19, Richard G. Riley <(E-Mail Removed)> wrote:
> In another thread it was pointed out that I'd made a booboo with
> strcpy : one that that I've, if I'm honest, made many times before.
> Not out of badness, just because since I first programmed C back in
> 1986 (and have done so for about 25 % of the time since then) or so I
> never really looked at the manpage for strcpy : this combined with
> K&Rs famous pointer lessons which lead to 2 or 3 versions of a linear
> start to finish strpy implementatin meant that on some occasions I
> used strcpy to move blocks of memory which may overlap in a character
> buffer. Bad. Sloppy. This combined with swapping between languages
> probably made me a little careless.
>
> Anyway, my question is this:
>
> Why would any language comittee decide to make strcpy work in an
> undefined manner for an overlapping object? It seems to me to be as
> valid to do the same for memmove.


It's hard (possibly impossible) to implement a single-pass string
copying implementation that will behave well when moving a block from an
earlier position to a later position in the buffer, and it's a rare
enough need that it is senseless to add the extra overhead of
[effectively] strlen+memmove to the function.

And since you seem to want it to act like a "linear start to finish"
implementation, that would mean saying that it's undefined if the
destination overlaps the source to the right, and not undefined if it
overlaps it to the left - and that would be ugly and would basically
mandate a particular implementation.
 
Reply With Quote
 
Jordan Abel
Guest
Posts: n/a
 
      03-19-2006
On 2006-03-19, Robert Gamble <(E-Mail Removed)> wrote:
>> If there is overlap and the instructions being used would corrupt the
>> operation then, and only then, branch off to a more robust copy using
>> a call to memmove or something similar.

>
> Are you suggesting that strcpy try to determine of the objects overlap
> and behave accordingly? Why do you think the strcpy function should be
> making this decision over the programmer and how exactly would strcpy
> determine if the objects do overlap?


I think he wants to be able to allow the destination to overlap the
source to the left, while still leaving it undefined if it overlaps to
the right. Since that's what some particular implementation he's used in
the past does.
 
Reply With Quote
 
Mark McIntyre
Guest
Posts: n/a
 
      03-19-2006
On Sun, 19 Mar 2006 14:41:47 +0100, in comp.lang.c , "Richard G.
Riley" <(E-Mail Removed)> wrote:

>Why would any language comittee decide to make strcpy work in an
>undefined manner for an overlapping object?


Because its designed for copying strings, and generally speaking, you
want to copy from one string to another, not from a string to itself?

>t seems to me to be as
>valid to do the same for memmove.


Well, memmove is for moving arbitrary chunks of memory, not for
copying strings. I don't consider the two very similar.

Mark McIntyre
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      03-19-2006
Richard G. Riley wrote:
> [...]
> Why would any language comittee decide to make strcpy work in an
> undefined manner for an overlapping object? It seems to me to be as
> valid to do the same for memmove. [...]


Speed, or at least the possibility of speed. The fewer
corner cases a library function must worry about, the more
freedom the implementation has to use assorted tricks to
make it go faster. Most library functions are not required
to handle overlapping sources and destinations.

memmove() is an oddity in that it is well-defined even if
source overlaps destination. Note that there is also a memcpy()
function that "does the same thing," but whose behavior is *not*
defined in the case of overlap -- an implementor may be able to
provide a "faster" memcpy() and a "safer" memmove(). One might
imagine a strmove() function that bears the same relation to
strcpy() as memmove() does to memcpy(), but there seems to be
little demand for it. Perhaps if you can find enough like-minded
compatriots you could lobby the C0x committee to include such a
thing in the next Standard.

Why this worship of speed? I'm among those who regularly
discourage over-aggressive optimization of code: if you spend
an extra hour researching, developing, and testing a trick that
saves one microsecond per execution, you need to execute the
code 3.6 billion times just to break even. Most code doesn't
execute that many times, so why am I suddenly doing an about-
face and defending aggressively optimized strcpy()?

Because library functions really do have enormous execution
counts. I wouldn't worry about optimizing abort() or setvbuf(),
but strcpy() and sqrt() and getc() and printf() and ... These
functions are used heavily, and by many many programs, so it
makes sense to optimize them aggressively. (The cynic in me says
that the functions used by standard benchmark suites are especially
likely candidates for "creative" optimization -- a colleague at a
PPOE told of a compiler that replaced printf("Hello, world!\n")
with puts("Hello, world!") so as to avoid interpreting a format
string!) At any rate, the Standard committee felt that strcpy()
was one of those functions where aggressive optimization ought
to be allowed, so they granted it a license to ignore certain
corner cases they thought relatively uncommon.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)lid
 
Reply With Quote
 
Jordan Abel
Guest
Posts: n/a
 
      03-19-2006
On 2006-03-19, Eric Sosman <(E-Mail Removed)> wrote:
> Richard G. Riley wrote:
>> [...]
>> Why would any language comittee decide to make strcpy work in an
>> undefined manner for an overlapping object? It seems to me to be as
>> valid to do the same for memmove. [...]

> Why this worship of speed? I'm among those who regularly
> discourage over-aggressive optimization of code: if you spend an extra
> hour researching, developing, and testing a trick that saves one
> microsecond per execution, you need to execute the code 3.6 billion
> times just to break even. Most code doesn't execute that many times,
> so why am I suddenly doing an about-face and defending aggressively
> optimized strcpy()?
>
> Because library functions really do have enormous execution
> counts. I wouldn't worry about optimizing abort() or setvbuf(), but
> strcpy() and sqrt() and getc() and printf() and ... These functions
> are used heavily, and by many many programs, so it makes sense to
> optimize them aggressively. (The cynic in me says that the functions
> used by standard benchmark suites are especially likely candidates for
> "creative" optimization -- a colleague at a PPOE told of a compiler
> that replaced printf("Hello, world!\n") with puts("Hello, world!") so
> as to avoid interpreting a format string!)


That might sound like benchmark cheating, but given how commonly printf
is used for a static string ending in a newline in the real world, it's
not really. (and, on the other hand, how often is Hello, World used as a
benchmark?) (note: gcc does this, and it does it consistently for both
"any\n" and "%s\n".)

On some systems, perror() is optimized to the point of requiring extra
handling in freopen() to be able to properly deal with reopening stderr.

> At any rate, the Standard committee felt that strcpy() was one of
> those functions where aggressive optimization ought to be allowed, so
> they granted it a license to ignore certain corner cases they thought
> relatively uncommon.


Plus, the trivial implementation, with no optimization at all, DOES
cause undefined behavior if the destination overlaps the source on the
right.
 
Reply With Quote
 
Richard G. Riley
Guest
Posts: n/a
 
      03-20-2006
On 2006-03-19, Robert Gamble <(E-Mail Removed)> wrote:
> Richard G. Riley wrote:
>> In another thread it was pointed out that I'd made a booboo with
>> strcpy : one that that I've, if I'm honest, made many times
>> before. Not out of badness, just because since I first programmed C
>> back in 1986 (and have done so for about 25 % of the time since then)
>> or so I never really looked at the manpage for strcpy :
>> this combined with K&Rs famous pointer lessons which lead to 2 or 3
>> versions of a linear start to finish strpy implementatin meant that on
>> some occasions I used strcpy to move blocks of memory which may
>> overlap in a character buffer. Bad. Sloppy. This combined with swapping
>> between languages probably made me a little careless.
>>
>> Anyway, my question is this:
>>
>> Why would any language comittee decide to make strcpy work in an
>> undefined manner for an overlapping object? It seems to me to be as
>> valid to do the same for memmove.

>
> Because the vast majority of the time you are not copying overlapping
> objects and it can be a lot more efficient to assume the objects

don't

I think you miss the point. If you dont copy backwards then there is
no overlap issue if the "from" is after the "start" : no checks
required. This can be documented rather thn a rathre offhand "no
verlap at all". remember that the overhead IS there for memmove : you need a
strlen() call as I metioned.

> overlap. If the objects might overlap you can always use memmove.


>
>> In the platform/compiler implementation for strcpy you can do
>> something very quick/CPU specific from start to finish (which doesnt
>> mind about overlap) or not as you please.

>
> But you can usually perform the operation quicker if you don't have to
> worry about the possibility of overlap.


The overlap would only causes an issue in the previously mentioned
case wouldnt it? Again, there is an overhead in memmove too : the
strlen required.

>
>> If there is overlap and the
>> instructions being used would corrupt the operation then, and only
>> then, branch off to a more robust copy using a call to memmove or
>> something similar.

>
> Are you suggesting that strcpy try to determine of the objects overlap
> and behave accordingly? Why do you think the strcpy function should be
> making this decision over the programmer and how exactly would strcpy
> determine if the objects do overlap?


I dont want it to do anything : I just dont want it to be "undefined"
in the case discusses. Fine its not, I just would have thought it
wasnt such a big thing.

>
>> The use of memmove invariably results in a possibly
>> unnecessary strlen so why not just wrap it all in the strcpy function?

>
> Why would memmove call strlen? Memmove does not operate on strings, it
> operates on a specified number of bytes. The difference between
> memmove and memcpy (which has the same overlapping restriction as
> strcpy) is that the former operates as if it had first copied the
> source into a new object avoiding the issue of overlap.


Yes. I know. And thats why I am asking. You need a strlen because to
use memmove you need to know the length of the area being moved.

And anyway, a 2 pointer comparison is hardly a huge overhead is it?

>
> Robert Gamble
>


 
Reply With Quote
 
Richard G. Riley
Guest
Posts: n/a
 
      03-20-2006
On 2006-03-20, Richard G. Riley <(E-Mail Removed)> wrote:
> On 2006-03-19, Robert Gamble <(E-Mail Removed)> wrote:
>> Richard G. Riley wrote:
>>> In another thread it was pointed out that I'd made a booboo with
>>> strcpy : one that that I've, if I'm honest, made many times
>>> before. Not out of badness, just because since I first programmed C
>>> back in 1986 (and have done so for about 25 % of the time since then)
>>> or so I never really looked at the manpage for strcpy :
>>> this combined with K&Rs famous pointer lessons which lead to 2 or 3
>>> versions of a linear start to finish strpy implementatin meant that on
>>> some occasions I used strcpy to move blocks of memory which may
>>> overlap in a character buffer. Bad. Sloppy. This combined with swapping
>>> between languages probably made me a little careless.
>>>
>>> Anyway, my question is this:
>>>
>>> Why would any language comittee decide to make strcpy work in an
>>> undefined manner for an overlapping object? It seems to me to be as
>>> valid to do the same for memmove.

>>
>> Because the vast majority of the time you are not copying overlapping
>> objects and it can be a lot more efficient to assume the objects

> don't
>
> I think you miss the point. If you dont copy backwards then there is
> no overlap issue if the "from" is after the "start" : no checks
> required. This can be documented rather thn a rathre offhand "no
> verlap at all". remember that the overhead IS there for memmove : you need a
> strlen() call as I metioned.




I've reconsidered all this. While I know what I would have done, I can
also see why they did what they did : so no more arguments/discussion
from me. And also apologies for the even more than usual typo content
on the last post
 
Reply With Quote
 
Robert Gamble
Guest
Posts: n/a
 
      03-20-2006
Richard G. Riley wrote:
> On 2006-03-19, Robert Gamble <(E-Mail Removed)> wrote:
> > Richard G. Riley wrote:
> >> In another thread it was pointed out that I'd made a booboo with
> >> strcpy : one that that I've, if I'm honest, made many times
> >> before. Not out of badness, just because since I first programmed C
> >> back in 1986 (and have done so for about 25 % of the time since then)
> >> or so I never really looked at the manpage for strcpy :
> >> this combined with K&Rs famous pointer lessons which lead to 2 or 3
> >> versions of a linear start to finish strpy implementatin meant that on
> >> some occasions I used strcpy to move blocks of memory which may
> >> overlap in a character buffer. Bad. Sloppy. This combined with swapping
> >> between languages probably made me a little careless.
> >>
> >> Anyway, my question is this:
> >>
> >> Why would any language comittee decide to make strcpy work in an
> >> undefined manner for an overlapping object? It seems to me to be as
> >> valid to do the same for memmove.

> >
> > Because the vast majority of the time you are not copying overlapping
> > objects and it can be a lot more efficient to assume the objects

> don't
>
> I think you miss the point. If you dont copy backwards then there is
> no overlap issue if the "from" is after the "start" : no checks
> required. This can be documented rather thn a rathre offhand "no
> verlap at all". remember that the overhead IS there for memmove : you need a
> strlen() call as I metioned.


So now you also want the Standard to dictate how to implement strcpy?

> > overlap. If the objects might overlap you can always use memmove.

>
> >
> >> In the platform/compiler implementation for strcpy you can do
> >> something very quick/CPU specific from start to finish (which doesnt
> >> mind about overlap) or not as you please.

> >
> > But you can usually perform the operation quicker if you don't have to
> > worry about the possibility of overlap.

>
> The overlap would only causes an issue in the previously mentioned
> case wouldnt it? Again, there is an overhead in memmove too : the
> strlen required.
>
> >
> >> If there is overlap and the
> >> instructions being used would corrupt the operation then, and only
> >> then, branch off to a more robust copy using a call to memmove or
> >> something similar.

> >
> > Are you suggesting that strcpy try to determine of the objects overlap
> > and behave accordingly? Why do you think the strcpy function should be
> > making this decision over the programmer and how exactly would strcpy
> > determine if the objects do overlap?

>
> I dont want it to do anything : I just dont want it to be "undefined"
> in the case discusses. Fine its not, I just would have thought it
> wasnt such a big thing.
>
> >
> >> The use of memmove invariably results in a possibly
> >> unnecessary strlen so why not just wrap it all in the strcpy function?

> >
> > Why would memmove call strlen? Memmove does not operate on strings, it
> > operates on a specified number of bytes. The difference between
> > memmove and memcpy (which has the same overlapping restriction as
> > strcpy) is that the former operates as if it had first copied the
> > source into a new object avoiding the issue of overlap.

>
> Yes. I know. And thats why I am asking. You need a strlen because to
> use memmove you need to know the length of the area being moved.


Yes, if you are using memmove to copy strings and don't know the length
of the source string you will need to use strlen, I didn't get your
point the first time around.

> And anyway, a 2 pointer comparison is hardly a huge overhead is it?


Do you mean to determine if the objects overlap? If the objects don't
overlap then the pointer comparision is undefined. If you do determine
through some other method that the strings don't overlap it still
doesn't mean that the objects which contain the strings don't overlap
which, from my reading of the Standard, would result in undefined
behavior in the current definition of strcpy.

Robert Gamble

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[ANN] XML Standards Library 2.1 : Updated 2004-06-29 Tech Support XML 0 06-29-2004 03:56 AM
please help me in distinguish redefining functions, overloading functions and overriding functions. Xiangliang Meng C++ 1 06-21-2004 03:11 AM
PN: Support for common functions not in Ansi Standards Peter Nolan C++ 2 02-19-2004 11:42 AM
[ANN] XML Standards Library 2.0 : Updated 2003-12-20 Tech Support XML 0 12-21-2003 12:03 PM
[ANN] XML Standards Library 2.0 : Updated 2003-11-13 Tech Support XML 2 11-13-2003 03:02 PM



Advertisments