Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Increasing efficiency in C

Reply
Thread Tools

Increasing efficiency in C

 
 
jacob navia
Guest
Posts: n/a
 
      03-03-2004
As everybody knows, C uses a zero delimited unbounded
pointer for its representation of strings.

This is extremely inefficient because at each query of the
length of the string, the computer starts an unbounded
memory scan searching for a zero that ends the string.

A more efficient representation is:

struct string {
size_t length;
char data[];
};

The length operation becomes just a memory read.
This would considerably speed the programs. The basic
idea is to use a string type that is length prefixed and
allows run-time checking against UB: undefined
behavior.

Comparing strings is speeded up also because when
testing for equality, the first length comparison tells
maybe the whole story with just a couple of
memory reads.

A string like the one described above is not able to
resize itself. Any pointers to it would cease to be valid
when it is resized if the memory allocator is forced to
move memory around. The block where that string was
allocated is bounded by another blocks in memory, and
it is not possible to resize it.

A pointer ( an indirect representation) costs a sizeof(void *)
but allows to resize strings without invalidating the pointers
to them.

struct string {
size_t length;
char *data;
};

There is no compelling reason to choose one or the other.
It depends on the application. In any case, the standard
library could be complemented by
Strcmp
Strcpy
etc., all using length prefixed strings.

Syntactic sugar.

I have added some sugar to this coffee. I always liked coffee
with a bit of sugar. I feel that is too acid without it.

Current strings are used using the [ ] notation. This strings
could have the same privilege isn't it?

The language extension I propose is that the user has the right to
define the operation [ ] for any data type he/she wishes.

Not a big deal for today's compilers.

Length checked strings can then use:

String s;
....
s[2] = 'a';

I think I am proposing the obvious.

Do you agree?

jacob


 
Reply With Quote
 
 
 
 
Mike Wahler
Guest
Posts: n/a
 
      03-03-2004
"jacob navia" <> wrote in message
news:c25kuu$rae$...

> The language extension I propose is that the user has the right to
> define the operation [ ] for any data type he/she wishes.
>
> Not a big deal for today's compilers.
>
> Length checked strings can then use:
>
> String s;
> ...
> s[2] = 'a';
>
> I think I am proposing the obvious.


I think you're proposing C++. Rather than try to 'reinvent' it,
I just use it.

-Mike


 
Reply With Quote
 
 
 
 
jacob navia
Guest
Posts: n/a
 
      03-03-2004

"Mike Wahler" <> a écrit dans le message de
news:LMs1c.17672$ hlink.net...
> "jacob navia" <> wrote in message
> news:c25kuu$rae$...
>
>
> I think you're proposing C++. Rather than try to 'reinvent' it,
> I just use it.


Well I can't use it Mike.

Just too complex.

Default instantiated template traits?

No thanks. Just characters. What I like of C is that it is not
"object oriented".

It is not oriented at all. It is the programmer that puts
the orientation of the program.

C++ has good ideas, but the complexity of the whole is
so staggering, that actually it is a reminder where that leads,
not knowing when to stop.

The crux of the matter is knowing when to stop. When a feature
becomes a nuisance, and doesn't simplify the task it is better
to drop it.

Syntactic sugar can lead to caries in the teeths. I said I like
a *bit* of sugar. Not five spoonfuls you see?

I want a bit of sugar in my coffee and not some coffee in my
sugar.

jacob



 
Reply With Quote
 
Charles Harrison Caudill
Guest
Posts: n/a
 
      03-03-2004
jacob navia <> wrote:
> As everybody knows, C uses a zero delimited unbounded
> pointer for its representation of strings.


> This is extremely inefficient because at each query of the
> length of the string, the computer starts an unbounded
> memory scan searching for a zero that ends the string.


Correction, strcmp is inefficient, I'd like to see someone produce a more
efficient model for strings using even assembly

> A more efficient representation is:


> struct string {
> size_t length;
> char data[];
> };


I'm not really sure what the question is, you seem to have asked and answered
all in one...

> The language extension I propose is that the user has the right to
> define the operation [ ] for any data type he/she wishes.


foo[x] is the value held in the array foo at location x, not string foo at
location x. strings don't even exist in c, just memory.

> Length checked strings can then use:


> String s;
> ...
> s[2] = 'a';


> I think I am proposing the obvious.


See C++

--
Harrison Caudill | .^ www.hypersphere.org
Computer Science & Physics Double Major | | Me*Me=1
Georgia Institute of Technology | v' I'm just a normal guy
 
Reply With Quote
 
Mike Wahler
Guest
Posts: n/a
 
      03-03-2004
"jacob navia" <> wrote in message
news:c25lvs$pev$...
>
> "Mike Wahler" <> a écrit dans le message de
> news:LMs1c.17672$ hlink.net...
> > "jacob navia" <> wrote in message
> > news:c25kuu$rae$...
> >
> >
> > I think you're proposing C++. Rather than try to 'reinvent' it,
> > I just use it.

>
> Well I can't use it Mike.


Whatever.

> Just too complex.


You needn't use all of it. You seem to be wanting
a 'real' string type, which C++ has, and it's not
at all difficult to use. Actually, if such a type
is needed, imo that's a good enough reason to use C++
(even if everything else is only the common subset
of the two languages).

> Default instantiated template traits?


So don't use 'em.

> No thanks. Just characters. What I like of C is that it is not
> "object oriented".


C++ does not require OO design. This is a very common misconception.

> It is not oriented at all. It is the programmer that puts
> the orientation of the program.


Right. Which is why I find C++ useful for very many things.
(and C as well, and other languages too).

> C++ has good ideas, but the complexity of the whole is
> so staggering, that actually it is a reminder where that leads,
> not knowing when to stop.


I use the parts I find useful, discard the rest.

> The crux of the matter is knowing when to stop. When a feature
> becomes a nuisance, and doesn't simplify the task it is better
> to drop it.


Or ignore it. Simple, huh?

> Syntactic sugar can lead to caries in the teeths. I said I like
> a *bit* of sugar. Not five spoonfuls you see?


Ever hear of self-discipline? Anything can be abused.

> I want a bit of sugar in my coffee and not some coffee in my
> sugar.


Who's got control of the spoon, you or someone else?

I'll stop now, I don't want to be accused of language
advocacy in a group about a different language. (It's
probably too late, though )

-Mike


 
Reply With Quote
 
Leor Zolman
Guest
Posts: n/a
 
      03-03-2004
On Wed, 3 Mar 2004 23:07:25 +0100, "jacob navia" <>
wrote:

>As everybody knows, C uses a zero delimited unbounded
>pointer for its representation of strings.


zero terminated, anyway.

>
>This is extremely inefficient because at each query of the
>length of the string, the computer starts an unbounded
>memory scan searching for a zero that ends the string.


It is "extremely inefficient" only if you're continuously recalculating the
length. For applications where you're not, it is extremely efficient.

>
>A more efficient representation is:
>
>struct string {
> size_t length;
> char data[];
>};


What is that [] about? That's not a legal definition. Are you implying that
a fixed-length array implementation (with an actual size in there) is an
improvement in any significant way over a simple char *? I don't think so.
>
>The length operation becomes just a memory read.
>This would considerably speed the programs. The basic
>idea is to use a string type that is length prefixed and
>allows run-time checking against UB: undefined
>behavior.


Now it is starting to sound like Java.

>
>Comparing strings is speeded up also because when
>testing for equality, the first length comparison tells
>maybe the whole story with just a couple of
>memory reads.


Perhaps a bit; but on average, inequality is determined pretty quick the
conventional way, and equality would actually take /more/ time to
determine. But yes, you might net a teeny bit of an improvement.

>
>A string like the one described above is not able to
>resize itself.


Are we talking about the one with the fixed-length array, or the version
with the mysterious empty brackets? Either way, /nothing/ in C can "resize
itself"...

>Any pointers to it would cease to be valid
>when it is resized if the memory allocator is forced to
>move memory around. The block where that string was
>allocated is bounded by another blocks in memory, and
>it is not possible to resize it.
>
>A pointer ( an indirect representation) costs a sizeof(void *)
>but allows to resize strings without invalidating the pointers
>to them.
>
>struct string {
> size_t length;
> char *data;
>};


This is the classic first C++ class implementation exercise. Thinking about
it yields some good fundamental principles about class design. But, to
achieve a true performance benefit in a string service, it ultimately
requires tailoring the string implementation to the specific circumstances
in which it will be used. There's no magic bullet; the "irrational
exuberance" surrounding the rise-and-fall of reference counted Standard C++
string implementations is a case in point.

>
>There is no compelling reason to choose one or the other.
>It depends on the application. In any case, the standard
>library could be complemented by
>Strcmp
>Strcpy
>etc., all using length prefixed strings.
>
>Syntactic sugar.
>
>I have added some sugar to this coffee. I always liked coffee
>with a bit of sugar. I feel that is too acid without it.
>
>Current strings are used using the [ ] notation. This strings
>could have the same privilege isn't it?
>
>The language extension I propose is that the user has the right to
>define the operation [ ] for any data type he/she wishes.
>
>Not a big deal for today's compilers.


Might be a bigger deal for the C Standards committee

>
>Length checked strings can then use:
>
>String s;
>...
>s[2] = 'a';
>
>I think I am proposing the obvious.
>
>Do you agree?


Mike's right: use C++.
-leor

>
>jacob
>


Leor Zolman
BD Software

www.bdsoft.com -- On-Site Training in C/C++, Java, Perl & Unix
C++ users: Download BD Software's free STL Error Message
Decryptor at www.bdsoft.com/tools/stlfilt.html
 
Reply With Quote
 
Leor Zolman
Guest
Posts: n/a
 
      03-03-2004
On Wed, 03 Mar 2004 22:54:20 GMT, "Mike Wahler" <>
wrote:

>"jacob navia" <> wrote in message
>news:c25lvs$pev$...
>>
>> "Mike Wahler" <> a écrit dans le message de
>> news:LMs1c.17672$ hlink.net...
>> > "jacob navia" <> wrote in message
>> > news:c25kuu$rae$...
>> >
>> >
>> > I think you're proposing C++. Rather than try to 'reinvent' it,
>> > I just use it.

>>
>> Well I can't use it Mike.

>
>Whatever.
>
>> Just too complex.

>
>You needn't use all of it. You seem to be wanting
>a 'real' string type, which C++ has, and it's not
>at all difficult to use. Actually, if such a type
>is needed, imo that's a good enough reason to use C++
>(even if everything else is only the common subset
>of the two languages).
>


Jaocb: There's even a common term used to describe C++ when used only for
those features that are a direct "clean-up" of messiness left over from C's
need to be backward-compatible with earlier incarnations of itself: "A
Better C". I don't think the C++ string class is typically lumped in with
those features (which I'm reluctant to go into here since this isn't a C++
group), but I'd think some more about it before discarding the idea of
using C++ as "C plus strings." I even wrote a CUJ article to help folks
with this very issue:
http://www.bdsoft.com/resources/thinking.html
(The title has "STL" in it, but the article is really about migration of
char *-based code to using strings)
-leor

Leor Zolman
BD Software

www.bdsoft.com -- On-Site Training in C/C++, Java, Perl & Unix
C++ users: Download BD Software's free STL Error Message
Decryptor at www.bdsoft.com/tools/stlfilt.html
 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      03-04-2004

"Leor Zolman" <> a écrit dans le message de
news:...
> On Wed, 3 Mar 2004 23:07:25 +0100, "jacob navia" <>
> wrote:
>
> >As everybody knows, C uses a zero delimited unbounded
> >pointer for its representation of strings.

>
> zero terminated, anyway.


Yes Sir!
Zero terminated and surely NOT zero delimited. What a deep
difference

>
> >
> >This is extremely inefficient because at each query of the
> >length of the string, the computer starts an unbounded
> >memory scan searching for a zero that ends the string.

>
> It is "extremely inefficient" only if you're continuously recalculating

the
> length.


Obviously. And this is a very common use, haven't you
notice it?


> For applications where you're not, it is extremely efficient.


Sorry but this string was once constructed, and the
length was known. Why not keeping this information?

What about the security?

What about the failure modes of unbounded pointers,

>
> >
> >A more efficient representation is:
> >
> >struct string {
> > size_t length;
> > char data[];
> >};

>
> What is that [] about? That's not a legal definition.


C99 introduces variable length arrays. This is standard
notation.


> Are you implying that
> a fixed-length array implementation (with an actual size in there) is an
> improvement in any significant way over a simple char *?


Yes.

1 Length operation is trivial
2 Comparisons for equality are cheaper when the length
of the strings differ. You never know this in C strings
and you have to start scanning for that zero...
3 Bounds checked strings can be implemented.

> I don't think so.


Well. I think so for the reasons above. Can you
maybe go to those reasons in detail?

> >
> >The length operation becomes just a memory read.
> >This would considerably speed the programs. The basic
> >idea is to use a string type that is length prefixed and
> >allows run-time checking against UB: undefined
> >behavior.

>
> Now it is starting to sound like Java.
>


In matters of languages I do not despise any. I am
sorry, I like C but I am not a zealot, and see
C's problems and weakness. A bad string type
is the reason for many bugs we could really get
rid of.

> >
> >Comparing strings is speeded up also because when
> >testing for equality, the first length comparison tells
> >maybe the whole story with just a couple of
> >memory reads.

>
> Perhaps a bit; but on average, inequality is determined pretty quick the
> conventional way, and equality would actually take /more/ time to
> determine. But yes, you might net a teeny bit of an improvement.
>


And also net a big security improvement...

> >
> >A string like the one described above is not able to
> >resize itself.

>
> Are we talking about the one with the fixed-length array, or the version
> with the mysterious empty brackets? Either way, /nothing/ in C can "resize
> itself"...
>

Sorry, I thought realloc was part of C...

> This is the classic first C++ class implementation exercise. Thinking

about
> it yields some good fundamental principles about class design.


Maybe but I do not want any class design. There are no classes
in C. I want strings for holding text. As I said, no
default instantiated template traits. Just chars please.

> But, to
> achieve a true performance benefit in a string service, it ultimately
> requires tailoring the string implementation to the specific circumstances
> in which it will be used. There's no magic bullet; the "irrational
> exuberance" surrounding the rise-and-fall of reference counted Standard

C++
> string implementations is a case in point.
>


Yes, each application has its own needs. That's why I would
propose that the user writes many specialized string
structures, that share a common description.

Length delimited strings are infinitely extensible with other
features.

> Mike's right: use C++.


I answered that to Mike. See my answer in a parallel thread.
I think C is the last not object oriented language around.
That makes it very interesting.

jacob
http://www.cs.virginia.edu/~lcc-win32


 
Reply With Quote
 
Leor Zolman
Guest
Posts: n/a
 
      03-04-2004
On Thu, 4 Mar 2004 01:19:26 +0100, "jacob navia" <>
wrote:

>
>"Leor Zolman" <> a écrit dans le message de
>news:.. .
>> On Wed, 3 Mar 2004 23:07:25 +0100, "jacob navia" <>
>> wrote:
>>
>> >As everybody knows, C uses a zero delimited unbounded
>> >pointer for its representation of strings.

>>
>> zero terminated, anyway.

>
>Yes Sir!
>Zero terminated and surely NOT zero delimited. What a deep
>difference


I think of delimiters as a matched set, terminated as asymmetric. Just
seemed off to use it there, but yes, I'm sure everyone knew what you meant.

>
>>
>> >
>> >This is extremely inefficient because at each query of the
>> >length of the string, the computer starts an unbounded
>> >memory scan searching for a zero that ends the string.

>>
>> It is "extremely inefficient" only if you're continuously recalculating

>the
>> length.

>
>Obviously. And this is a very common use, haven't you
>notice it?


Knowing something about how the strings are going to be used is precisely
what drives the design decision of which flavor to use. When there's going
to be a lot of repeated length testing, that fact may contribute to a
decision against my using plain old char *'s /in that application/.

>
>
>> For applications where you're not, it is extremely efficient.

>
>Sorry but this string was once constructed, and the
>length was known. Why not keeping this information?


Keeping and maintaining it has a spacial and temporal cost. Is it always
justified? Sometimes, probably. Usually? Always?

>
>What about the security?
>
>What about the failure modes of unbounded pointers,


C doesn't provide any automatic protection for these things. The spirit of
C is to let the programmer program them if they're needed. Period.

>
>>
>> >
>> >A more efficient representation is:
>> >
>> >struct string {
>> > size_t length;
>> > char data[];
>> >};

>>
>> What is that [] about? That's not a legal definition.

>
>C99 introduces variable length arrays. This is standard
>notation.


Darn, I'm really going to actually have to write a piece of code using VLAs
some day, so I can at least recognize them when they get used (blush). But
the problem is, I don't like them

>
>
>> Are you implying that
>> a fixed-length array implementation (with an actual size in there) is an
>> improvement in any significant way over a simple char *?

>
>Yes.
>
>1 Length operation is trivial
>2 Comparisons for equality are cheaper when the length
> of the strings differ. You never know this in C strings
> and you have to start scanning for that zero...


....or the first mismatch. If you happen to know that enough of your strings
will be identical for their first several characters /and/ be of different
lengths for this to make a significant difference, you'd have good reason
to use your implementation /in that application/.

>3 Bounds checked strings can be implemented.


They can, but lots of things /can/ be implemented, it is just that C has no
pretense of supporting such things at the core language level. Neither does
C++, for that matter.

>
>> I don't think so.

>
>Well. I think so for the reasons above. Can you
>maybe go to those reasons in detail?


I'm not compelled to, no.

>
>> >
>> >The length operation becomes just a memory read.
>> >This would considerably speed the programs. The basic
>> >idea is to use a string type that is length prefixed and
>> >allows run-time checking against UB: undefined
>> >behavior.

>>
>> Now it is starting to sound like Java.
>>

>
>In matters of languages I do not despise any. I am
>sorry, I like C but I am not a zealot, and see
>C's problems and weakness. A bad string type
>is the reason for many bugs we could really get
>rid of.


Nor do I despise Java (I've even written an article, still available on
line somewhere, outlining why I believe Java makes a great "first"
programming language.) But hand-holding features are just /not/ in C's job
description, I'm sorry.

>
>> >
>> >Comparing strings is speeded up also because when
>> >testing for equality, the first length comparison tells
>> >maybe the whole story with just a couple of
>> >memory reads.

>>
>> Perhaps a bit; but on average, inequality is determined pretty quick the
>> conventional way, and equality would actually take /more/ time to
>> determine. But yes, you might net a teeny bit of an improvement.
>>

>
>And also net a big security improvement...


Which you may or may not want to pay for.

>
>> >
>> >A string like the one described above is not able to
>> >resize itself.

>>
>> Are we talking about the one with the fixed-length array, or the version
>> with the mysterious empty brackets? Either way, /nothing/ in C can "resize
>> itself"...
>>

>Sorry, I thought realloc was part of C...


What I'm saying is that nothing "resizes itself", there has to be user code
to recognize the need, dispatch to the appropriate functions, etc. At any
given point in a design, a C programmer can choose whether or not to do
that stuff. She may choose not to, for reasons that make all the sense in
world for that application. She may not want that overhead forced upon her.

>
>> This is the classic first C++ class implementation exercise. Thinking

>about
>> it yields some good fundamental principles about class design.

>
>Maybe but I do not want any class design. There are no classes
>in C. I want strings for holding text. As I said, no
>default instantiated template traits. Just chars please.


I'm not trying to force class design down your throat, I'm just saying
that "black-box" string management is always going to be either prejudicial
to some quality of the data being operated upon, or middle-of-the road and
thus probably not optimal for /your/ situation, whatever that may be; it
can't be sufficiently general-purpose and really efficient for some special
case...

>
>> But, to
>> achieve a true performance benefit in a string service, it ultimately
>> requires tailoring the string implementation to the specific circumstances
>> in which it will be used. There's no magic bullet; the "irrational
>> exuberance" surrounding the rise-and-fall of reference counted Standard

>C++
>> string implementations is a case in point.
>>

>
>Yes, each application has its own needs. That's why I would
>propose that the user writes many specialized string
>structures, that share a common description.
>
>Length delimited strings are infinitely extensible with other
>features.
>
>> Mike's right: use C++.

>
>I answered that to Mike. See my answer in a parallel thread.
>I think C is the last not object oriented language around.
>That makes it very interesting.
>
>jacob
>http://www.cs.virginia.edu/~lcc-win32
>


Okay. Good luck in your quest,
-leor



Leor Zolman
BD Software

www.bdsoft.com -- On-Site Training in C/C++, Java, Perl & Unix
C++ users: Download BD Software's free STL Error Message
Decryptor at www.bdsoft.com/tools/stlfilt.html
 
Reply With Quote
 
Mike Wahler
Guest
Posts: n/a
 
      03-04-2004
"jacob navia" <> wrote in message
news:c25smg$pqs$...
> > Are we talking about the one with the fixed-length array, or the version
> > with the mysterious empty brackets? Either way, /nothing/ in C can

"resize
> > itself"...
> >

> Sorry, I thought realloc was part of C...


Ever notice that the return value from 'realloc()' is
often not the same as returned by the original 'malloc()' or
'calloc()'? 'realloc()' often (in my experience almost
always, except in the cases of 'trivial' size increases)
does a new allocation followed by a copy and a deallocation.
Not really a 'resizing'.


-Mike


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Increasing the Global Clock value inside the design ? anil VHDL 2 06-17-2005 06:03 AM
Increasing data transfer on a firewall to firewall vpn connection providencebuddy@yahoo.com Cisco 1 06-14-2005 10:20 PM
increasing detail of WebVPN logging Andy Cisco 0 09-14-2004 03:48 PM
Increasing wireless network number =?Utf-8?B?c2FjaGlu?= Wireless Networking 2 09-03-2004 04:11 AM
Increasing alias on 6509 Glenn Cisco 0 12-05-2003 07:31 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57