Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > "The Unreasonable Effectiveness of C" by Damien Katz

Reply
Thread Tools

"The Unreasonable Effectiveness of C" by Damien Katz

 
 
Lynn McGuire
Guest
Posts: n/a
 
      01-20-2013
On 1/19/2013 5:37 PM, Edward A. Falk wrote:
> In article <kd8c3h$2mh$(E-Mail Removed)>, BGB <(E-Mail Removed)> wrote:
>>>
>>>> But most good high-level langauges now include array operations.
>>>
>>>

>>
>> I have seen hardly any languages which allow adding arrays in this way.

>
> Yeah, I was going to say. What languages *do* add arrays this way?
> PL/1? Algol?


Fortran starting with the 1990 spec:

http://research.physics.illinois.edu...nfo/array.html

Lynn


 
Reply With Quote
 
 
 
 
glen herrmannsfeldt
Guest
Posts: n/a
 
      01-20-2013
Edward A. Falk <(E-Mail Removed)> wrote:
> In article <kd8c3h$2mh$(E-Mail Removed)>, BGB <(E-Mail Removed)> wrote:


>>>> But most good high-level langauges now include array operations.


>>I have seen hardly any languages which allow adding arrays in this way.


> Yeah, I was going to say. What languages *do* add arrays this way?
> PL/1? Algol?


I am not sure about ALGOL. PL/I had array operations from the beginning,
Fortran added them in Fortran 90.

Many interpreted languages have array operations, maybe back
to APL. Mathematica, Matlab, R, and similar systems all do.
A google search for "array expressions" shows Jive,
Haskell, and PostGreSQL (maybe SQL in general), and Cython.

PL/I might be the only one with structure expressions.
You can add (or multiply, or just about anything else)
two structures in PL/I for member by member evaluation.

DCL (1 X, 1Y) , 2 A FIXED DECIMAL(5,2), 2 B FLOAT BINARY(10);

(I haven't written many structure declarations in PL/I,
but I believe that is right.

X=1;
Y=2;
PUT SKIP LIST(X+Y);

-- glen
 
Reply With Quote
 
 
 
 
glen herrmannsfeldt
Guest
Posts: n/a
 
      01-20-2013
BGB <(E-Mail Removed)> wrote:

(snip)
> though, there may still be things which C can't do, which may require
> occasional use of real ASM code.


> one thing that is often a hindrance to using C as a good target language
> for compilers is, ironically, its fairly strict adherence to top-level
> block-structuring.


> some C extensions, such as computed goto, can help to a limited extent
> here, but given computed goto can't cross function boundaries (or, by
> extension, boundaries between one compilation unit and another), a limit
> is placed on its use here (IOW: it doesn't scale very well).


longjmp() with an array of jmp_buf should work.

-- glen
 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      01-20-2013


"glen herrmannsfeldt" <(E-Mail Removed)> wrote in message
news:kdff3p$tsc$(E-Mail Removed)...
> BGB <(E-Mail Removed)> wrote:
>
> (snip)
>> though, there may still be things which C can't do, which may require
>> occasional use of real ASM code.

>
>> one thing that is often a hindrance to using C as a good target language
>> for compilers is, ironically, its fairly strict adherence to top-level
>> block-structuring.

>
>> some C extensions, such as computed goto, can help to a limited extent
>> here, but given computed goto can't cross function boundaries (or, by
>> extension, boundaries between one compilation unit and another), a limit
>> is placed on its use here (IOW: it doesn't scale very well).

>
> longjmp() with an array of jmp_buf should work.


I use computed goto for threaded code (I assume computed goto is the use of
label pointers). I use it because in my project, alternatives are slower.

Is longjmp (which I've never used) likely to match the speed of an indirect
goto? And is it suitable for doing large numbers of such jumps one after the
other? (With computed gotos, I do around 100 million every second.)

--
Bartc


 
Reply With Quote
 
BGB
Guest
Posts: n/a
 
      01-20-2013
On 1/19/2013 6:52 PM, glen herrmannsfeldt wrote:
> BGB <(E-Mail Removed)> wrote:
>
> (snip)
>> though, there may still be things which C can't do, which may require
>> occasional use of real ASM code.

>
>> one thing that is often a hindrance to using C as a good target language
>> for compilers is, ironically, its fairly strict adherence to top-level
>> block-structuring.

>
>> some C extensions, such as computed goto, can help to a limited extent
>> here, but given computed goto can't cross function boundaries (or, by
>> extension, boundaries between one compilation unit and another), a limit
>> is placed on its use here (IOW: it doesn't scale very well).

>
> longjmp() with an array of jmp_buf should work.
>


not sure how one would pull this off exactly...

if you mean having a bunch of "if(setjmp(&arr[N]))goto lbl;", this could
work, but IMO is likely to be slower than splitting things up into
functions and calling through function pointers.


I did have an idea which I am writing up in a spec, which I am calling
"CLIL" for "C-like Intermediate Language", which would include some of
the ideas for non-local computed goto (and calls via labels), but would
otherwise be a language more like "C--" or "B".

basically, it would probably be as part of a JIT backend for my
scripting language (and probably less of a pain than writing a full JIT
directly).

or such...

 
Reply With Quote
 
BGB
Guest
Posts: n/a
 
      01-20-2013
On 1/19/2013 7:26 PM, BartC wrote:
>
>
> "glen herrmannsfeldt" <(E-Mail Removed)> wrote in message
> news:kdff3p$tsc$(E-Mail Removed)...
>> BGB <(E-Mail Removed)> wrote:
>>
>> (snip)
>>> though, there may still be things which C can't do, which may require
>>> occasional use of real ASM code.

>>
>>> one thing that is often a hindrance to using C as a good target language
>>> for compilers is, ironically, its fairly strict adherence to top-level
>>> block-structuring.

>>
>>> some C extensions, such as computed goto, can help to a limited extent
>>> here, but given computed goto can't cross function boundaries (or, by
>>> extension, boundaries between one compilation unit and another), a limit
>>> is placed on its use here (IOW: it doesn't scale very well).

>>
>> longjmp() with an array of jmp_buf should work.

>
> I use computed goto for threaded code (I assume computed goto is the use
> of label pointers). I use it because in my project, alternatives are
> slower.
>


yeah. computed goto is the use of label pointers and then using goto to
call them. it is nifty, but sadly, isn't available with MSVC, so I am
mostly using function pointers, but still using threaded code.

this works ok, but still can't reach native speeds.


like, using selection sort (an O(n^2) sort) on a 64k element array still
takes several minutes.

likewise for things like, at present, doing a "fib(40)" or similar via
the recursive Fibonacci function.

but things still take considerably less time with natively compiled code
(along the lines of a matter of seconds, like around 15 seconds or so
for the array sort).

so, around 133M swaps/sec for C, and about 8M swaps/second for
interpreted code. with each swap involving spinning around in a "for()"
loop and comparing array indices.


though all this is "usually good enough", and I have been at least
gradually squeezing more speed out of the plain-C interpreter.


> Is longjmp (which I've never used) likely to match the speed of an
> indirect goto? And is it suitable for doing large numbers of such jumps
> one after the other? (With computed gotos, I do around 100 million every
> second.)
>


I doubt it. "longjmp()" is actually likely to be a bit slower than
either computed goto or a call through a function-pointer, since it
essentially involves restoring registers potentially followed by several
indirect jumps.


AFAICT, the issue is mostly that making an indirect call seems to cause
a pipeline stall or similar, although at least the CPU seems somehow
able to predict its way through a return instruction.


a recent speed-squeezing trick though is breaking the opcodes up into
traces, which execute opcodes in sequence (basically, a trace will
unconditionally execute a chain of instructions, usually terminated via
a jump or an opcode which has a chance of throwing an exception).

for example:
BSVM_ThreadTrace *BSVM_ThTr_RunBasic4(BSVM_SVMState *ctx,
BSVM_ThreadTrace *tr)
{
BSVM_ThreadOp *op;
op=op->fcn(ctx, tr->thop);
op=op->fcn(ctx, op);
op=op->fcn(ctx, op);
op=op->fcn(ctx, op);
return(tr->next);
}

(which execute a fixed number of thread-ops and transfer control to the
next trace).

or, the slightly more generic fallback case:
BSVM_ThreadTrace *BSVM_ThTr_RunDefault(BSVM_SVMState *ctx,
BSVM_ThreadTrace *tr)
{
BSVM_ThreadOp *op;
int i;
op=tr->thop; i=tr->n_thop-1;
while(i--)
{ op=op->fcn(ctx, op); }
return(tr->end(ctx, tr, op));
}


all this can help some, but still isn't native speeds.


I don't know exactly how many ops per second are executing, but I
suspect it is probably "up there", given I am measuring loops involving
calling C functions (via the FFI) and getting/setting object fields
capable of spinning at a rate of several million iterations per second.


so, luckily, it probably at least isn't *that* slow, even if a JIT or
similar would probably be a bit faster.


 
Reply With Quote
 
BGB
Guest
Posts: n/a
 
      01-20-2013
On 1/19/2013 9:05 PM, William Ahern wrote:
> BGB <(E-Mail Removed)> wrote:
> <snip>
>> some C extensions, such as computed goto, can help to a limited extent
>> here, but given computed goto can't cross function boundaries (or, by
>> extension, boundaries between one compilation unit and another), a limit
>> is placed on its use here (IOW: it doesn't scale very well).

> <snip>
>> an alternative compromise could be though if (as a C extension) it were
>> possible to fetch and call label-pointers as if they were function
>> pointers (using the same signature as that of the parent function), and
>> also potentially refer to them from outside the function in question
>> (more like a struct), but currently I am not aware of any compiler which
>> supports this.

>
>> foo.label1(3, 4); //call into foo() via an arbitrary label.
>> fcn=&foo.label1; //get label pointer as function pointer.

>
>> possibly with something inside foo like:
>> "extern label1:"
>> with extern giving a compiler hint that maybe it should generate any
>> glue needed to make this callable.

> <snip>
>
> I was surprised to find that the following GCC hack works:
>
> #include <stdio.h>
>
> static void *One, *Two;
>
> static int foo(void *label) {
> __label__ one, two;
> __attribute__((constructor, used)) void bar(void) {
> One = &&one;
> Two = &&two;
> }
>
> goto *label;
> one:
> return 1;
> two:
> return 2;
> } /* foo() */
>
> int main(void) {
> printf("one: %d\n", foo(One));
> printf("two: %d\n", foo(Two));
>
> return 0;
> } /* main() */
>
> I discuss some of the details in a blog post "Acquiring address of labels
> outside of scope": http://25thandclement.com/~william/blog/
>



well, yes, it works...

the downside is, of course, that you still have to call into the
function (or otherwise be inside the the function) to goto the label
(and have a single massive function to make much use of it).

I am imagining if some of these restrictions were lifted, such that a
language would allow direct inter-procedural goto, and ability to call
directly to labels within a function, so in these regards would be more
like assembler (sort of like a macro-assembler with more C like syntax).


I started writing out some ideas for such a language, but it isn't
really C (the idea ended up involving largely dropping
structured-programming constructs, instead treating functions more as a
way of declaring the active stack-frame layout), ... and would probably
use a simplistic single-pass compiler, and be largely untyped.

(in contrast, C puts considerably more distance between the language and
the underlying ASM code).

still thinking on it though, don't yet know if I will do anything with
the idea.


or such...

 
Reply With Quote
 
BGB
Guest
Posts: n/a
 
      01-20-2013
On 1/20/2013 12:32 AM, BGB wrote:
>
> for example:
> BSVM_ThreadTrace *BSVM_ThTr_RunBasic4(BSVM_SVMState *ctx,
> BSVM_ThreadTrace *tr)
> {
> BSVM_ThreadOp *op;
> op=op->fcn(ctx, tr->thop);
> op=op->fcn(ctx, op);
> op=op->fcn(ctx, op);
> op=op->fcn(ctx, op);
> return(tr->next);
> }
>


blarg...

"clever" last-minute optimization kind of made this one critically broken.

fixed version more like:
BSVM_ThreadTrace *BSVM_ThTr_RunBasic4(BSVM_SVMState *ctx,
BSVM_ThreadTrace *tr)
{
BSVM_ThreadOp *op;
op=tr->thop;
op=op->fcn(ctx, op);
...
}

which avoids using an uninitialized variable...

well, nevermind then...


 
Reply With Quote
 
BGB
Guest
Posts: n/a
 
      01-20-2013
On 1/20/2013 2:29 AM, William Ahern wrote:
> BGB <(E-Mail Removed)> wrote:
>> On 1/19/2013 9:05 PM, William Ahern wrote:

> <snip>
>>> I was surprised to find that the following GCC hack works:

> <snip nested constructor function example>
>> well, yes, it works...

>
>> the downside is, of course, that you still have to call into the
>> function (or otherwise be inside the the function) to goto the label
>> (and have a single massive function to make much use of it).

>
> You can actually jump into the function. It should be just as safe (or not
> safe) as that nested constructor function, which is also technically not
> supposed to be invoked outside of the parent's invocation. That's why I was
> surprised that GCC actually set it up for the linker to invoke.
>


I didn't see much that appeared to be outside of normal GCC behavior...

this works mostly because labels exist at compile time, rather than
runtime (as would be the case for auto-variables or similar).

so, in this case, what the nested function is doing is little-different
from a non-nested function, apart from being able to see the labels in
the parent function.


> But by jumping straight into the function you wouldn't have the ability to
> use any auto variables. You could "return", though, by using another
> computed goto in like manner.
>


well, this is going outside of C-land here...


actually, I imagined that it would still allow auto-variables, just in a
very unsafe way:
whatever is the current stack-frame, will be interpreted as if it had
the same layout as that present in the function body.

this would mean the ability to "safely" goto between any two functions
with the same function-arguments and variable declarations, and still
use the variables (it would be undefined-behavior if the declared
variables don't match exactly though when doing so).

basically, the declared variables and arguments list would be taken as
the "active stack-frame layout" at the point where the code is compiled.


so, if the same variables exist by the same name in the same order, ...,
then they may be accessed as such. and, if the frame-layout doesn't
match, well then...



>> I am imagining if some of these restrictions were lifted, such that a
>> language would allow direct inter-procedural goto, and ability to call
>> directly to labels within a function, so in these regards would be more
>> like assembler (sort of like a macro-assembler with more C like syntax).

>
> call/cc?
>


well, you could probably implement a call/cc via such a mechanism, but
it isn't exactly the same thing as call/cc, since all this would be
operating at a much lower level (call/cc being a much higher-level
mechanism).


basically, in typical ASM (or, non-macro ASM), there are no actual
"functions" in the high-level sense, it is mostly all just registers,
memory, and labels. the calling-convention is then a convention built on
top of these lower-level constructions.

in something like a code-generator, whatever higher-level constructions
exist, are typically mapped over to a "call frame" structure on the
stack, for example:

....
arg3
arg2
arg1
return EIP
saved EBP
var1
var2
var3
....

it can then be asserted then, that the main thing a function body does
is to establish the layout of the call-frame, but does not necessarily
mandate how control flow either enters or leaves the function (the top
of the function then is merely a "default" entry point).

then, a person can allow all the crazy semantics that can be allowed via
horizontal gotos.

calls via a label would be special, mostly in that a call via a label
would construct a new call-frame.

the most likely way to deal with this would be to mark the label as
"extern", and then the compiler would spit out the code necessary to
allow calls into the function via this label (so, it would "cheat"
slightly in this sense, by effectively generating code for multiple
entry points to a function, making the callable label-pointer
essentially be a normal function pointer).

some cases would probably also require generating frame-cleanup code
(similar to that used in tail-call optimization).


but, yeah, still all hypothetical at this point...

or such...

 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      01-20-2013
"BGB" <(E-Mail Removed)> wrote in message
news:kdg35c$kvt$(E-Mail Removed)...
> On 1/19/2013 7:26 PM, BartC wrote:


>> I use computed goto for threaded code (I assume computed goto is the use
>> of label pointers). I use it because in my project, alternatives are
>> slower.
>>

>
> yeah. computed goto is the use of label pointers and then using goto to
> call them. it is nifty, but sadly, isn't available with MSVC, so I am
> mostly using function pointers, but still using threaded code.
>
> this works ok, but still can't reach native speeds.


With my interpreter, and a combined bunch of benchmarks, I get a time of 51
seconds using these label pointers (for opcode dispatch). Using plain
switch it was 62 seconds. With a simple loop calling function pointers it
was 75 seconds (because most inlining ability is lost).

So there's not much in it, and still a long way from native code. (And when
I tested on an ARM processor, there seemed little difference between
indirect gotos, and switch.)

(Plugging in an actual ASM module to do the dispatch, but initially just
calling the C handlers, it was 82 seconds! So the same time as the function
dispatch method, plus extra overheads of the more complex ASM threaded
call-chain (plus save/restore registers). However, it took a just a few
hours to write enough inline functions in the ASM module to beat the best
51-second C timing, but only by 10%, so not worth the trouble or the loss of
flexibility.)

> like, using selection sort (an O(n^2) sort) on a 64k element array still
> takes several minutes.
>
> likewise for things like, at present, doing a "fib(40)" or similar via the
> recursive Fibonacci function.


For fib(40) I get 18 seconds using the function pointer loop, and 9.5
seconds using label pointers. I've achieved about 5 seconds in the past, but
there were too many complications. (Native C was 0.8 to 1.4 seconds.)

I've given up trying to get anywhere near the speed of native code (which is
not going to happen with the sort of approaches I use); instead I have in
mind an intermediate language to do that!

> though all this is "usually good enough", and I have been at least
> gradually squeezing more speed out of the plain-C interpreter.


That's the thing; for real programs, differences in speed are smaller (while
a few things are faster than C, for a given amount of effort, thanks to the
use of counted strings for example).


--
Bartc

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Question about unreasonable slowness allenjo5@mail.northgrum.com Python 10 11-21-2006 08:22 PM
New releases: ER, Dr. Katz & Survivor: Vanuatu: Updated complete downloadable R1 DVD DB & info lists Doug MacLean DVD Video 1 08-31-2006 08:09 PM
New releases: King Kong (2005), Madagascar & Dr. Katz: Updated complete downloadable R1 DVD DB & info lists Doug MacLean DVD Video 1 01-31-2006 03:09 PM
Unreasonable Reboots =?Utf-8?B?Z3Rw?= Wireless Networking 3 10-14-2004 06:45 PM
Katz DDL Forum! Join today! Jay D. Computer Support 1 07-27-2003 09:31 PM



Advertisments