On Thu, 4 Mar 2004 01:19:26 +0100, "jacob navia" <>
wrote:
>
>"Leor Zolman" <> a écrit dans le message de
>news:.. .
>> On Wed, 3 Mar 2004 23:07:25 +0100, "jacob navia" <>
>> wrote:
>>
>> >As everybody knows, C uses a zero delimited unbounded
>> >pointer for its representation of strings.
>>
>> zero terminated, anyway.
>
>Yes Sir!
>Zero terminated and surely NOT zero delimited. What a deep
>difference
I think of delimiters as a matched set, terminated as asymmetric. Just
seemed off to use it there, but yes, I'm sure everyone knew what you meant.
>
>>
>> >
>> >This is extremely inefficient because at each query of the
>> >length of the string, the computer starts an unbounded
>> >memory scan searching for a zero that ends the string.
>>
>> It is "extremely inefficient" only if you're continuously recalculating
>the
>> length.
>
>Obviously. And this is a very common use, haven't you
>notice it?
Knowing something about how the strings are going to be used is precisely
what drives the design decision of which flavor to use. When there's going
to be a lot of repeated length testing, that fact may contribute to a
decision against my using plain old char *'s /in that application/.
>
>
>> For applications where you're not, it is extremely efficient.
>
>Sorry but this string was once constructed, and the
>length was known. Why not keeping this information?
Keeping and maintaining it has a spacial and temporal cost. Is it always
justified? Sometimes, probably. Usually? Always?
>
>What about the security?
>
>What about the failure modes of unbounded pointers,
C doesn't provide any automatic protection for these things. The spirit of
C is to let the programmer program them if they're needed. Period.
>
>>
>> >
>> >A more efficient representation is:
>> >
>> >struct string {
>> > size_t length;
>> > char data[];
>> >};
>>
>> What is that [] about? That's not a legal definition.
>
>C99 introduces variable length arrays. This is standard
>notation.
Darn, I'm really going to actually have to write a piece of code using VLAs
some day, so I can at least recognize them when they get used (blush). But
the problem is, I don't like them
>
>
>> Are you implying that
>> a fixed-length array implementation (with an actual size in there) is an
>> improvement in any significant way over a simple char *?
>
>Yes.
>
>1 Length operation is trivial
>2 Comparisons for equality are cheaper when the length
> of the strings differ. You never know this in C strings
> and you have to start scanning for that zero...
....or the first mismatch. If you happen to know that enough of your strings
will be identical for their first several characters /and/ be of different
lengths for this to make a significant difference, you'd have good reason
to use your implementation /in that application/.
>3 Bounds checked strings can be implemented.
They can, but lots of things /can/ be implemented, it is just that C has no
pretense of supporting such things at the core language level. Neither does
C++, for that matter.
>
>> I don't think so.
>
>Well. I think so for the reasons above. Can you
>maybe go to those reasons in detail?
I'm not compelled to, no.
>
>> >
>> >The length operation becomes just a memory read.
>> >This would considerably speed the programs. The basic
>> >idea is to use a string type that is length prefixed and
>> >allows run-time checking against UB: undefined
>> >behavior.
>>
>> Now it is starting to sound like Java.
>>
>
>In matters of languages I do not despise any. I am
>sorry, I like C but I am not a zealot, and see
>C's problems and weakness. A bad string type
>is the reason for many bugs we could really get
>rid of.
Nor do I despise Java (I've even written an article, still available on
line somewhere, outlining why I believe Java makes a great "first"
programming language.) But hand-holding features are just /not/ in C's job
description, I'm sorry.
>
>> >
>> >Comparing strings is speeded up also because when
>> >testing for equality, the first length comparison tells
>> >maybe the whole story with just a couple of
>> >memory reads.
>>
>> Perhaps a bit; but on average, inequality is determined pretty quick the
>> conventional way, and equality would actually take /more/ time to
>> determine. But yes, you might net a teeny bit of an improvement.
>>
>
>And also net a big security improvement...
Which you may or may not want to pay for.
>
>> >
>> >A string like the one described above is not able to
>> >resize itself.
>>
>> Are we talking about the one with the fixed-length array, or the version
>> with the mysterious empty brackets? Either way, /nothing/ in C can "resize
>> itself"...
>>
>Sorry, I thought realloc was part of C...
What I'm saying is that nothing "resizes itself", there has to be user code
to recognize the need, dispatch to the appropriate functions, etc. At any
given point in a design, a C programmer can choose whether or not to do
that stuff. She may choose not to, for reasons that make all the sense in
world for that application. She may not want that overhead forced upon her.
>
>> This is the classic first C++ class implementation exercise. Thinking
>about
>> it yields some good fundamental principles about class design.
>
>Maybe but I do not want any class design. There are no classes
>in C. I want strings for holding text. As I said, no
>default instantiated template traits. Just chars please.
I'm not trying to force class design down your throat, I'm just saying
that "black-box" string management is always going to be either prejudicial
to some quality of the data being operated upon, or middle-of-the road and
thus probably not optimal for /your/ situation, whatever that may be; it
can't be sufficiently general-purpose and really efficient for some special
case...
>
>> But, to
>> achieve a true performance benefit in a string service, it ultimately
>> requires tailoring the string implementation to the specific circumstances
>> in which it will be used. There's no magic bullet; the "irrational
>> exuberance" surrounding the rise-and-fall of reference counted Standard
>C++
>> string implementations is a case in point.
>>
>
>Yes, each application has its own needs. That's why I would
>propose that the user writes many specialized string
>structures, that share a common description.
>
>Length delimited strings are infinitely extensible with other
>features.
>
>> Mike's right: use C++.
>
>I answered that to Mike. See my answer in a parallel thread.
>I think C is the last not object oriented language around.
>That makes it very interesting.
>
>jacob
>http://www.cs.virginia.edu/~lcc-win32
>
Okay. Good luck in your quest,
-leor
Leor Zolman
BD Software
www.bdsoft.com -- On-Site Training in C/C++, Java, Perl & Unix
C++ users: Download BD Software's free STL Error Message
Decryptor at
www.bdsoft.com/tools/stlfilt.html