On 12/14/2012 4:23 AM, Ian Collins wrote:
> BGB wrote:
>> On 12/14/2012 12:38 AM, Ian Collins wrote:
>>>
>>> Other languages have better support for manipulating JSON objects, but
>>> at least one of them (PHP) uses a C library under the hood.
>>
>> yeah...
>>
>> I use variants of both JSON and S-Expressions, but mostly for
>> dynamically typed data.
>>
>> not depending on the use of type-tags and data mined from headers would
>> require a rather different implementation strategy.
>>
>> most of my code is C, but I make fairly extensive use of dynamic-type
>> tagging.
>
> I originally wrote a JSON library to enable my (C++) sever side web
> application it interact with client side JavaScript. I soon found the
> objects extremely useful for building dynamic type objects in general
> programming. I doubt the same would be true in C, not elegantly at least.
>
I actually use a fair amount of dynamically-typed stuff in C.
another big area has to do with my scripting language (largely itself
based on JavaScript and ActionScript).
the main downsides have to do with performance, namely that
dynamically-typed code just doesn't perform anywhere near as well as
statically-typed code.
in many areas, this isn't a big issue, and/or leads to a hybrid
strategy, with some occasional type-checking, but much of the code
remains statically typed (for example, much of my 3D renderer largely
works this way).
typically, this would be using dynamic type-checking more for things
like classifying "what type of object is this?" and dispatching more to
the appropriate part of the renderer (or, IOW: type-checking on the
level of structures). a few places are "optimized" though, mostly to try
to avoid using dynamically-typed operations in tight-loops (such as
scene-graph object lookup operations, ...).
side note (main project):
http://www.youtube.com/user/BGBTech
http://www.youtube.com/watch?v=hhdq87CXL0U
(note, a lot of parts of the video are from much earlier versions of my
3D engine...).
(although the 3D engine has some amount of code written in my
script-language, I generally don't use script-code within the 3D
renderer, as the languages' performance is a bit weak, and the 3D
renderer is a lot more performance-sensitive).
sadly, performance is kind of a problem area sometimes, but much of this
more has to do with getting everything fed through OpenGL.
in my case, as-noted, the type-tagging is largely transparent to C code,
apart from the use of special allocation and freeing functions, and
supplying type-names, ...
these type-names can be retrieved via API calls, like:
BGBDY_API char *dygettype(dyt val);
or checked with functions like:
BGBDY_API int dyTypeP(dyt val, char *ty);
many operations on dynamically typed values may check types using these
mechanisms, but many internal operations essentially bypass this
(usually, because if/else chains of "strcmp()" calls can get expensive...).
elsewhere, for many core operations rather than fetching the type-name
from the object, the code will instead fetch a "type vtable":
BGBDY_API dyt dyCopyValue(dyt p)
{
BGBDY_ObjVTab *ty;
if(dyCheckValueNoCopyP(p))
return(p);
ty=BGBDY_GetObjVTabFast(p);
if(ty && ty->copyValue)
return(ty->copyValue(p));
return(p);
}
BGBDY_API int dyDropValue(dyt p)
{
BGBDY_ObjVTab *ty;
if(dyCheckValueNoCopyP(p))
return(0);
ty=BGBDY_GetObjVTabFast(p);
if(ty && ty->dropValue)
return(ty->dropValue(p));
return(0);
}
where BGBDY_GetObjVTabFast basically calls off to code for some
(slightly long/ugly) logic for fetching the object-base, fetching the
"ObjType", and returning its "vtable" member.
in a few places (due mostly to performance issues), some of this has
lead to frequent use of things like type-signatures and untagged union
types:
#ifndef DYTVALUE_T
#define DYTVALUE_T
typedef union
{
s32 i; //int
u32 ui; //uint
s64 l; //long
u64 ul; //ulong
f32 f; //float
f64 d; //double
nlint a; //address
void *pv; //raw pointer
dyt r; //variant reference
dytf cr; //variant reference (double)
dycClass cls; //class
dycObject obj; //object
dycArray arr; //array
struct { int x, y; }k; //lexical env index
}dycValue;
#endif
the main reason being that these allow higher performance.
for the untagged union, it is because one can access values like:
dycValue a, b, c;
....
c.i=a.i+b.i;
and, apparently the compiler is smart enough to figure out this at least
(and generate vaguely efficient code).
the reason for type-signatures is that unlike full dynamic typing,
signatures are more readily usable with the "build stuff with function
pointers" strategy.
say, for example:
void DYLL_CopyValueBufI(dycValue *v, void *p)
{ *(s32 *)p=v->i; }
void DYLL_CopyValueBufL(dycValue *v, void *p)
{ *(s64 *)p=v->l; }
....
typedef struct {
void (*CopyValueBuf)(dycValue *v, void *p);
void (*CopyBufValue)(void *p, dycValue *v);
....
int offs;
}DYLL_ValueTransfer;
typedef struct {
DYLL_ValueTransfer *ret;
DYLL_ValueTransfer *args;
int nargs;
void (*CopyValuesBuf)(DYLL_ArgList *ctx, dycValue *args, byte *buf);
void (*CopyBufValues)(DYLL_ArgList *ctx, byte *buf, dycValue *args);
....
}DYLL_ArgList;
....
void DYLL_CopyValuesBuf3(DYLL_ArgList *ctx, dycValue *args, byte *buf)
{
ctx->args[0].CopyValueBuf(args+0, buf+ctx->args[0].offs);
ctx->args[1].CopyValueBuf(args+1, buf+ctx->args[1].offs);
ctx->args[2].CopyValueBuf(args+2, buf+ctx->args[2].offs);
}
BGBDY_API void DYLL_CopyValuesBuf(DYLL_ArgList *ctx,
dycValue *args, byte *buf)
{ ctx->CopyValuesBuf(ctx, args, buf); }
where the "DYLL_ArgList" structure may be built via a function which
parses the signature.
....
which, although arguably not exactly efficient, can always be a good
deal worse...
(there is a good deal more of this sort of logic for working with
"va_list" argument lists as well, typically for indirect calls
originating within C land, a lot of this stuff has to deal with calls
between natively compiled and interpreted code).
generally the "buffer" is a packed buffer-like representation of an
argument list, and is generally what is used by the ASM parts of the
call-handling machinery (which may in-turn repack these buffer into
ABI-specific representations, and then initiate the call).
partly, this is because, it isn't really desirable to have to deal with
the various possible argument-list representations and ASM code at the
same time.
in some places, the function-pointer funkiness serves to bypass older
machinery which often involves lots of sometimes long if/else chains.
a downside though is that a lot of this is, admittedly, a bit more
painful to work with, and the code can get very long/winding and ugly
(so, the spread of this sort of code in many places within the VM core
has both good and bad points).
however, a lot of these sorts of mechanisms end up somewhat abstracted,
and so are generally unseen by "user code" (luckily, it all looks much
nicer from the outside).
or such...