Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > function casts

Reply
Thread Tools

function casts

 
 
BartC
Guest
Posts: n/a
 
      10-13-2012
"Ben Bacarisse" <(E-Mail Removed)> wrote in message
news:0.ac8d25d47314c8bfed23.20121013132309BST.87k3 (E-Mail Removed)...
> "BartC" <(E-Mail Removed)> writes:


>> C also does pointer arithmetic using object-sized offsets, while the
>> source
>> language uses byte offsets! Generating dumb ASM code is actually far
>> easier.
>> (What's much harder is generating efficient, optimised ASM.)

>
> This point and the last one suggest that you are generating code at what
> I'd be tempted to call the wrong level. I.e. rather than a compiler it
> sounds like a translator.


It originally targeted ASM, so I guess it's a compiler. Maybe I could
generate C from the AST instead of intermediate code, and it might be
possible to create more structured C that way than the 'flat' C I'm creating
now. But I don't think that would solve too many of my current problems, and
would probably create new ones!

> If your source language pointers are type
> agnostic and use bytes offsets, I'd expect them all to appear as
> unsigned char pointer in the C output. C's rules for pointer
> compatibility and arithmetic won't then come into it.


My pointers are typed, but I just use byte offsets which are more flexible.
As an example of the type issues I have to deal with in C, take this program
in my source language (to do with the function casts I was asking about).
Not C-like, but hopefully clear:

ref int pcptr

proc disploop=
ref ref proc() fnptr @ pcptr

do
fnptr^^()
od
end

This is the C code generated (in the above code, fnptr and pcptr are the
same memory location, that needs to be emulated in C):

static void disploop(void);
static int (*pcptr);

void disploop(void) {
/* void (*(*fnptr)) (void) @pcptr */

L2:
(*((void (*(*)) (void))pcptr))();
goto L2;
}

And this is the x86 code that I might generate instead, just the function
body:

disploop:
L3:
mov esi,[pcptr]
mov eax,[esi]
call eax
jmp L3
retn

Notice this doesn't involve asterisks, parentheses or voids that have to be
in exactly the right pattern! Although the generated C could have been
tidied up a bit, I don't see how this stuff can be avoided, no matter how
the C is generated.

> Surely all your source-language names are kept in a separate space? You
> can then use an "escape" convention to capture case: source name func
> maps to srcfunc__ and Func to, say, src_func__. I can't see how a
> reserved word can be a problem unless you are trying to some sort of
> minimal source-level translation rather than what would be more normally
> termed compilation.
>
> If, on the other hand, you've decided that C functions should be
> callable with no wrappers at all, then I think you've just made a rod
> for your own back.


Maybe. Some problems can solved by writing wrapper functions *in C*. But I'm
trying to avoid that (you can't tell someone using the language and trying
to call an imported C function, to switch to another language!). As a silly
example, take this program in my source language:

stop

which is translated to a call to a runtime function, also in the source
language, which happens to call the imported C function exit(). However
'exit' is a reserved word in my language! So I call a wrapper function in C,
which calls exit() for me.

--
Bartc

 
Reply With Quote
 
 
 
 
Les Cargill
Guest
Posts: n/a
 
      10-13-2012
BartC wrote:
> "Les Cargill" <(E-Mail Removed)> wrote in message
> news:k5ajp5$c73$(E-Mail Removed)...
>> BartC wrote:

>
>>> It's been a struggle, as the two languages don't match exactly, neither
>>> is C anywhere near the 'portable assembler' that everyone says it is.

>>
>> Dunno - 'C' makes a pretty good target language, but you have to Think
>> Stupid.

>
> I can't; C insists on very strict type-checking, so I have to double guess
> what C expects the type of any expression to be, and compare that with the
> original types, and any coercions, of the source language.
>
> Then you may have to apply extra coercions which work differently on
> l-values, &-terms and *-terms. And there's the quirk where A usually means
> the value of A, *unless* it's an array then it means &A[0] (in the source
> language, arrays have values (as well as having arbitrary lower bounds for
> good measure!)).
>


So that's not *NEARLY* stupid enough By "Think Stupid", I mean
vastly reduce the number of degrees of freedom for the generated code.

> C is fussy about type-matching pointers (so T* and U* don't match); the
> source language doesn't care.
>


Generally, I store (void *) pointers in a table, then have a column in
the table that suggests a type.

> C also does pointer arithmetic using object-sized offsets, while the source
> language uses byte offsets! Generating dumb ASM code is actually far
> easier.
> (What's much harder is generating efficient, optimised ASM.)
>


Could be, then. I mean no disrespect, but it sounds like you're
doing it the Hard Way, which is ... hard.

> (There are many other issues: for example C is case-sensitive, the source
> language isn't; trying to import a C function such as fopen() into the
> source language, without also importing the FILE* type; words that are
> identifiers in one language but reserved words in the other, etc etc.)
>
>
>>> I can't make use of half the language (most of the control statements
>>> are out for example), and the output is terrible-looking, completely
>>> unstructured C.

>
>> The best way to generate 'C' is to
>> generate tables that hand-coded 'C' then operates on.
>> Tables can include callbacks...

>
> My starting point is intermediate code, a '3-address-code' kind of
> representation. At this point everything is already linearised and
> unstructured, as the usual target is assembly code. You then translate each
> instruction into a simple C statement, typically an assignment of the
> form X
> = Y op Z;
>
> It sounds easy; it isn't!
>


*Shrug?*

struct Thing[] = {
{ (void *)&X, (void *)&Y, "op", (void *)&Z },
....
};

void eval(Thing *t)
{
....
}

....

eval(&Thing[0]);

....


--
Les Cargill


 
Reply With Quote
 
 
 
 
BartC
Guest
Posts: n/a
 
      10-13-2012
"Les Cargill" <(E-Mail Removed)> wrote in message
news:k5ck02$tsk$(E-Mail Removed)...
> BartC wrote:


>> C also does pointer arithmetic using object-sized offsets, while the
>> source
>> language uses byte offsets! Generating dumb ASM code is actually far
>> easier.


> Could be, then. I mean no disrespect, but it sounds like you're
> doing it the Hard Way, which is ... hard.


What, always using byte offsets for pointers? It's very useful. What's hard
about it is that C works differently.

(Perhaps I should explain I'm using dynamic language A for this project,
which compiles source code of a static language B, into C source code. I
hardly write any actual C at all; mainly when I find myself editing the C
output files by mistake.)

>> My starting point is intermediate code, a '3-address-code' kind of
>> representation. At this point everything is already linearised and
>> unstructured, as the usual target is assembly code. You then translate
>> each
>> instruction into a simple C statement, typically an assignment of the
>> form X
>> = Y op Z;
>>
>> It sounds easy; it isn't!


> struct Thing[] = {
> { (void *)&X, (void *)&Y, "op", (void *)&Z },
> ...
> };


OK, that's a *bit* like my intermediate code (my operands can be more
elaborate and have type info attached to each. But it is a faithful
representation of what the source HLL is trying to do).

> void eval(Thing *t)
> {
> ...
> }
>
> ...
>
> eval(&Thing[0]);


Here I'm not sure what's happening. Does this code reside in a compiler (and
the Thing array has been filled in by the compiler)? Then the eval() routine
presumably writes out some C code into a separate file. In which case it's
not that different to what I'm actually doing. (And it's the tiny details
that are the problem: endless compile errors on the C output code, due to
some subtleties of the C type system that I have no interest in yet have to
understand and fix!)

Or is some kind of interpretation going on?

--
Bartc

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      10-13-2012
"BartC" <(E-Mail Removed)> writes:
> "Les Cargill" <(E-Mail Removed)> wrote in message
> news:k5ck02$tsk$(E-Mail Removed)...
>> BartC wrote:

>
>>> C also does pointer arithmetic using object-sized offsets, while the
>>> source
>>> language uses byte offsets! Generating dumb ASM code is actually far
>>> easier.

>
>> Could be, then. I mean no disrespect, but it sounds like you're
>> doing it the Hard Way, which is ... hard.

>
> What, always using byte offsets for pointers? It's very useful. What's hard
> about it is that C works differently.


C pointers of type `unsigned char*` work *exactly* like that.

[...]

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      10-13-2012


"Keith Thompson" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> "BartC" <(E-Mail Removed)> writes:


>> What, always using byte offsets for pointers? It's very useful. What's
>> hard
>> about it is that C works differently.

>
> C pointers of type `unsigned char*` work *exactly* like that.


Yes, I know, that's why I'm using lots of casts to (unsigned char*).

But in general, a pointer will have a stride of N bytes. It's not always
clear if that can be used directly (because the pointer might be unaligned,
and/or the offset could be odd).

--
Bartc



 
Reply With Quote
 
Ian Collins
Guest
Posts: n/a
 
      10-13-2012
On 10/14/12 12:16, BartC wrote:
>
>
> "Keith Thompson"<(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> "BartC"<(E-Mail Removed)> writes:

>
>>> What, always using byte offsets for pointers? It's very useful. What's
>>> hard
>>> about it is that C works differently.

>>
>> C pointers of type `unsigned char*` work *exactly* like that.

>
> Yes, I know, that's why I'm using lots of casts to (unsigned char*).


Why cast at all? Do everything with unsigned char*.

--
Ian Collins
 
Reply With Quote
 
Les Cargill
Guest
Posts: n/a
 
      10-14-2012
BartC wrote:
> "Les Cargill" <(E-Mail Removed)> wrote in message
> news:k5ck02$tsk$(E-Mail Removed)...
>> BartC wrote:

>
>>> C also does pointer arithmetic using object-sized offsets, while the
>>> source
>>> language uses byte offsets! Generating dumb ASM code is actually far
>>> easier.

>
>> Could be, then. I mean no disrespect, but it sounds like you're
>> doing it the Hard Way, which is ... hard.

>
> What, always using byte offsets for pointers? It's very useful. What's hard
> about it is that C works differently.
>
> (Perhaps I should explain I'm using dynamic language A for this project,
> which compiles source code of a static language B, into C source code. I
> hardly write any actual C at all; mainly when I find myself editing the
> C output files by mistake.)
>
>>> My starting point is intermediate code, a '3-address-code' kind of
>>> representation. At this point everything is already linearised and
>>> unstructured, as the usual target is assembly code. You then translate
>>> each
>>> instruction into a simple C statement, typically an assignment of the
>>> form X
>>> = Y op Z;
>>>
>>> It sounds easy; it isn't!

>
>> struct Thing[] = {
>> { (void *)&X, (void *)&Y, "op", (void *)&Z },
>> ...
>> };

>
> OK, that's a *bit* like my intermediate code (my operands can be more
> elaborate and have type info attached to each. But it is a faithful
> representation of what the source HLL is trying to do).
>
>> void eval(Thing *t)
>> {
>> ...
>> }
>>
>> ...
>>
>> eval(&Thing[0]);

>
> Here I'm not sure what's happening. Does this code reside in a compiler
> (and
> the Thing array has been filled in by the compiler)?


The Thing array is what has been generated. eval is handwritten.

>Then the eval()
> routine
> presumably writes out some C code into a separate file. In which case it's
> not that different to what I'm actually doing. (And it's the tiny details
> that are the problem: endless compile errors on the C output code, due to
> some subtleties of the C type system that I have no interest in yet have to
> understand and fix!)
>
> Or is some kind of interpretation going on?


Basically - yes. You stick values into an array, then use an
interpreter ( eval(...) )to exploit the array.

>

--
Les Cargill

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      10-14-2012
"BartC" <(E-Mail Removed)> writes:
> "Keith Thompson" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> "BartC" <(E-Mail Removed)> writes:

>
>>> What, always using byte offsets for pointers? It's very useful. What's
>>> hard
>>> about it is that C works differently.

>>
>> C pointers of type `unsigned char*` work *exactly* like that.

>
> Yes, I know, that's why I'm using lots of casts to (unsigned char*).


Why not just use pointers of type `unsigned char*` in the first place?

[...]

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      10-14-2012


"Keith Thompson" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> "BartC" <(E-Mail Removed)> writes:
>> "Keith Thompson" <(E-Mail Removed)> wrote in message
>> news:(E-Mail Removed)...
>>> "BartC" <(E-Mail Removed)> writes:

>>
>>>> What, always using byte offsets for pointers? It's very useful. What's
>>>> hard
>>>> about it is that C works differently.
>>>
>>> C pointers of type `unsigned char*` work *exactly* like that.

>>
>> Yes, I know, that's why I'm using lots of casts to (unsigned char*).

>
> Why not just use pointers of type `unsigned char*` in the first place?


This subthread is about the use of C as target language for a translator
from another language.

Pointers to all kinds of types will crop up everywhere, but particularly
where a direct C idiom (a->b, c[i] etc) can't be used. Then these casts are
necessary. (And once pointer expressions are used in some places, then it's
often easier to just use them everywhere rather than investigate which could
use a->b, c[i] etc)

For example, the other language might have a C-like struct data-type, but
with the ability to have extra unofficial fields defined by a type and a
byte offset from the start:

typedef struct _s {
float a,b,c,d;
} S;

S s;

Suppose this struct has another unofficial field 'x' of type int at byte
offset 12. To access this field in s, the expression:

*(&s+12)

can't be used; &S is a pointer to a struct (of stride perhaps 16 bytes), and
will point far outside the struct. Instead, if the offset is a multiple of
sizeof(int), the following could be used:

*((int*)&s+3)

If the offset was 10 instead (sometimes unaligned accesses are fine; or
perhaps s could itself be misaligned by 2 bytes), then this is impossible in
C without using byte offsets somewhere:

*((int*)((char*)&s+10))

(Most uses of these 'unofficial' fields, ie. those which redefine, shadow or
alias existing fields, could be represented in C by anonymous unions and
structs. Such unions and structs are available using gcc. However they are
still implemented in the source language by byte offsets.

Where x in the above example is at offset 12, then an official (gcc)
definition might be:

typedef struct _s {
float a,b
union (float c; int x;};
float d;
} S;

Where x is at offset 10, then it's a bit more fiddly:

#pragma pack(1)
typedef struct _s {
union {
struct {float a,b,c,d;};
struct {char dummy[10]; int x};
};
} S;

You can see that just using a byte offset might be simplest!)

--
Bartc

 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      10-14-2012
"BartC" <(E-Mail Removed)> writes:

> "Keith Thompson" <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
>> "BartC" <(E-Mail Removed)> writes:
>>> "Keith Thompson" <(E-Mail Removed)> wrote in message
>>> news:(E-Mail Removed)...
>>>> "BartC" <(E-Mail Removed)> writes:
>>>
>>>>> What, always using byte offsets for pointers? It's very useful. What's
>>>>> hard
>>>>> about it is that C works differently.
>>>>
>>>> C pointers of type `unsigned char*` work *exactly* like that.
>>>
>>> Yes, I know, that's why I'm using lots of casts to (unsigned char*).

>>
>> Why not just use pointers of type `unsigned char*` in the first place?

>
> This subthread is about the use of C as target language for a translator
> from another language.
>
> Pointers to all kinds of types will crop up everywhere, but particularly
> where a direct C idiom (a->b, c[i] etc) can't be used. Then these casts are
> necessary. (And once pointer expressions are used in some places, then it's
> often easier to just use them everywhere rather than investigate which could
> use a->b, c[i] etc)
>
> For example, the other language might have a C-like struct data-type, but
> with the ability to have extra unofficial fields defined by a type and a
> byte offset from the start:
>
> typedef struct _s {
> float a,b,c,d;
> } S;
>
> S s;
>
> Suppose this struct has another unofficial field 'x' of type int at byte
> offset 12. To access this field in s, the expression:
>
> *(&s+12)
>
> can't be used; &S is a pointer to a struct (of stride perhaps 16 bytes), and
> will point far outside the struct.


Why are there structs here at all? I think that's the point that's been
made elsewhere ("be dumb, not smart") and what I was getting at by
saying that it looks more like a translator than a compiler.

In an assembler-generating compiler there wouldd be no structs and there
don't have to be in C either. _s could be a char (well, unsigned char)
array of the right size. Just as you have to tell the assembler what
size object to load from an address, you would have to do the same in
the generated C, so while _s + 12 is now the right address you still
have to say how much to fetch (*(int *)(s + 12)) but it's probably
closer to the model you compiler seems to have been using before.

I suspect that you want the benefit of some C typing (so you don't have
to worry about the size of objects for example) but that's causing
problems elsewhere because the compiler's model is still based round the
raw machine picture that assembler gives you.

<snip>
> You can see that just using a byte offset might be simplest!)


Yes, and byte pointers (unsigned char *s) to go with them!

--
Ben.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Casts that look like function calls? dpbsmith.janissary.2006@gmail.com C++ 13 01-19-2007 08:31 PM
checking casts Dan Upton Java 4 12-01-2005 06:20 PM
Web casts in ASP.Net =?Utf-8?B?Q2hyaXMgRGF2b2xp?= ASP .Net 1 10-19-2005 09:45 PM
Needless casts? Joona I Palaste Java 15 04-25-2004 10:14 PM
Re: = operator should automatically perform appropriate casts cgbusch Java 2 07-08-2003 03:58 PM



Advertisments