William Ahern wrote:
> Micah Cowan <> wrote:
>> William Ahern wrote:
>>> Knuth and Berstein haven't written many checks.
>
>> http://en.wikipedia.org/wiki/Knuth_reward_check
>
>> ...As of March 2005, the total value of the checks signed by Knuth was
>> over $20,000...
>
> But how many of those are for MiX and other errors in his books? I meant to
> refer to things like TeX, parts of which are written in C.
Are they? I don't think he wrote any of TeX at all in C. He wrote it all
in Pascal (or, to be more accurate, he wrote it in WEB, which compiles
to Pascal).
The resulting Pascal, these days, is generally fed to a program that
compiles _that_ to C. Parts of TeTex may have been written in CWEB,
maybe, but not by him.
(I didn't mean any of that to imply that it's safer _because_ he wrote
it in Pascal rather than C, I was just disputing whether it was the case.)
>> I agree with Yevgen's general point that it is far too difficult to
>> write correct C programs. Even doing it 80-90% of the time, as most
>> regulars here can probably manage, is itself a noteworthy accomplishment.
>
> Writing a "hello world" program is harder in C than in Borne Shell, and
> harder still in an assembly language.
>
> On the flip side, in a simple "hello world" program string handling doesn't
> predominate, and you may very well have persuasive reasons for writing it in
> C than in Borne Shell.
I don't think anyone was talking about how much work it might be to code
in C. What we were talking about was how hard it is to code _safely_ in
C. It's an entirely different question. I don't think a "Hello world"
program's safety is appreciably harder to achieve in C than it is in sh.
More complex programs are a different issue.
And of course there are reasons for choosing C over other implementation
platforms (if that weren't the case, would I be a C programmer?

).
>> I think that part of being a good programmer, then, is to limit the
>> opportunities you have to make those mistakes. Set up frameworks to do
>> all the "good habit" stuff for you, so that you don't have to be
>> constantly avoiding "bad habit" stuff yourself (if you have to avoid a
>> mistake 999 times, the 1000th time you may fail to avoid it). This is
>> why, when it matters, many programs and packages will use their own
>> string-handling frameworks that do exactly that. The better you
>> encapsulate/hide away the details of managing buffer sizes, resizing,
>> concatenation, comparison, etc, the more you can focus on doing other
>> things.
>
> I agree. strlcpy(), though, fills in inevitable gaps between the standard
> interfaces, traditional string handling, and whatever design or manner of
> approaching the issue one takes. Seems to me that's as good a reason as any
> to include strlcpy(). On top of the fact, and more to the point, that it
> encapsulates the _minimal_ exact code one would normally and rightly employ
> in these situations.
Well, no, it doesn't. strcpy() plus a buffer check does. strlcpy() adds
one more thing: copying what it can of src to dst, regardless of whether
there was enough space for all of it, or whether that's what was wanted.
This has _never_ been what I want (usually, like Yevgen, I want to
allocate more space). I can't say it will never _be_ what I want, and I
know it's sometimes what others (apparently, including yourself) have
needed. Constrained by output limits is a legitimate case. Constrained
by input limits, IMO isn't a good one ("be liberal in what you accept").
Even with your example of RFC limits, most such limits are within the
context of mechanisms that provide ways to represent entities that do
not match those constraints. For instance, if I need to force arbitrary
text files to meet the constraints of RFC 2822, I may be using a fixed
line-buffer size, but I'm sure as hell not using strlcpy() to meet that
constraint. I'd be using quoted-printable or somesuch, instead.
And even if I'm writing an old-style tarfile with fixed block sizes and
a maximum filename length, I'd _still_ probably want to ensure I
generate a unique filename, rather than blindly truncating it.
In short, I rarely want to truncate, and when I _do_, I rarely want to
do it naively (as strlcat() will do).
I'm not against its inclusion, I just think its utility has been _way_
overblown.
And none of this has anything to do with the OP's actual question, which
was whether he'd been misled when people told him to always use
strlcpy() in preference to strcpy(). To which the answer, hopefully
obvious by now, is _yes_, he was misled. The utility of strcpy() is
_far_ more general than that of strlcpy().
And, while strlcpy() may be better than strcpy() for those limited
situations where you want a naive truncation (and don't mind its limited
portability), I don't see any basis for the claim that strlcpy() is
_safer_ than strcpy() (which, after all, is the basis for the claim that
you should always use it in preference to strcpy()). It is precisely as
easy to remember to use strlcpy() instead of strcpy(), as it is to
remember to check the buffer size before you strcpy() (the latter,
though, still gives you more options about what to do after the check
fails).
>> All that being said, I fail to see how strlcpy() or strcpy_s() help the
>> matter much. They aren't appreciably easier to use correctly, by which I
>> mean that they are approximately as prone to "bad habit" problems as
>> strcpy() is. They certainly don't hide the details of managing buffer
>> sizes, and you still have that opportunity to mess up on that 1000th
>> time you use it.
>
> That's an impossible criterion. No C library, IMO, can hide the details of
> buffer (aka memory, aka resource) management in C
struct allocator {
void * (*a_malloc)(void *, size_t);
void * (*a_realloc)(void *, void *, size_t);
void (*a_free)(void *, void *);
void *data;
};
struct str *str_new(struct allocator *);
struct str *str_cat(struct allocator *, struct str *, struct str *);
str_del(struct allocator *, struct str **);
.... etc, etc. I imagine there'd actually be versions of these same
functions that don't take the initial allocator, and just use a default one.
IMO, C++'s string classes (and many others in the standard C++ library)
handle the allocation problem in a quite general and elegant manner.
Surely a C library could emulate something similar, even if the syntax
were somewhat clunkier?
> and it's not clear to me
> that off-by-ones are substantially more of an issue than NULL or dangling
> pointers.
Both of which can be solved fairly gracefully (to the degree they can be
solved in C) by a library with an interface such as the one I've
outlined. And off-by-ones are a pretty small subset of buffer-size
violations. Forgetting to check, using the size variable for the wrong
buffer, forgetting to initialize the size variable, are all common
mistakes. Most of these can also be solved by a general library; none of
them are solved by using strlcpy() (except "forgetting to check", but as
already mentioned, this isn't a solution, it's an indirection. Instead
of forgetting to check buffer size, it becomes forgetting to use strlcpy()).
> They can only grease the wheels, so to speak. That is, better
> weave the patterns into your code. Encapsulation being one important way to
> accomplish that. But there are many levels of encapsulation, and many/most
> string libraries force you to too high a level of encapsulation for what its
> worth in many instances; rather than encapsulate they obsfuscate.
No argument there.
And I'm not saying that such a library should ever be part of the C
standard (though it might not be terrible, if done as carefully as C++
has done); what I _am_ saying, is that it would go a long way towards
solving the general issue with bounds checking, whereas strlcpy() is
only claimed to do so.
--
Micah J. Cowan
Programmer, musician, typesetting enthusiast, gamer...
http://micah.cowan.name/