| Home | Forums | Reviews | Guides | Newsgroups | Register | Search |
![]() |
| Thread Tools |
| Pavel |
|
|
|
| |
|
Stuart
Guest
Posts: n/a
|
Stuart wrote:
>> If the compiler wants to do anything with a class, it needs to know >> the complete >> definition of the class and all its base classes. On 10/11/12 Pavel wrote: > No, it does not need to know "the complete definition". You got me there. I meant to write complete declaration. Sorry. > I am not sure > about "anything", but specifically for generating code that calls a > virtual function all the compiler needs to know is how to calculate the > address of a particular subobject of the object of the most-derived > class pointed by the pointer given to compiler. This is an implementation detail of the compiler. "Thunking" compilers do not have to do any additional work when they invoke a virtual function: Base* ptr = ...; ptr->foo (); will result in the following (pseudo-) ASM: push ptr; call *[ptr + &foo] where &foo gives the slot of the vtable where foo's address is stored. In order to be able to generate such code, the compiler only has to know the complete declaration of Base and Base's base classes (or else it could not figure out the slot number of foo), even if ptr is initialized with an instance of the hitherto unknown class Derived. [snip] > [...] I > admit the "thunk" way avoids direct penalties for calling virtual by > pointers on first base). Here we go. If I may cite you from up-thread: On 10/7/12 Pavel wrote: > I think the above is not accurate. C++ code does suffer performance > penalties from using multiple inheritance. Moreover, and what's > especially frustrating, even the code that does not use multiple > inheritance (in fact, any code using virtual functions) suffers from > at least one performance penalty imposed by the way C++ supports > multiple inheritance: the necessity to read the offset of the call > target [...] Apparently you proved yourself wrong: It is not an inherent limitation of the C++ language that it _must_ be slower for single-inheritance hierarchies because it also supports multiple-inheritance. I think it would be more correct to say that C++ can be as efficient as any single-inheritance vtable-based language, provided the implementation of the C++ compiler is good (IOW, uses thunks). Regards, Stuart |
|
|
|
|
|||
|
|||
| Stuart |
|
|
|
| |
|
88888 Dihedral
Guest
Posts: n/a
|
Pavel於 2012年10月8日星期一UTC+8上午4時57分53秒 寫道:
> lieve again wrote: > > > On 23 Sep., 23:25, �� Tiib <oot...@hot.ee> wrote: > > >> On Sunday, 23 September 2012 23:40:42 UTC+3, lieve again wrote: > > >>> Ok, so the main reason for not implementing multiple inheritance > > >>> (without workarounds) is the complexity added to the programmers > > >>> (learning curve) and to the compiler developers (diamond > > >>> problem, ...). > > >> > > >> Yes, that is the reason. Also the solution, virtual inheritance is > > >> not too efficient nor simple. > > >> > > >>> I thought maybe making the interfaces pure virtual, > > >>> there was a way to avoid the extra vpointers and I wanted to know how.. > > >>> Then if I start adding pure virtual classes to impose the derived > > >>> classes with some kind of features like: > > >>> class Derived : implements Readable, Writeable, Comparable, > > >>> Convertible ... > > >>> regardless of the programming language, we are ending with instances > > >>> of the derived classes having 20 bytes or more even being those > > >>> classes with no members or empty. It is good to know. > > >> > > >> The bytes actually are cheap these days ... unless you write for some 8 bit > > >> controller. On common platforms most of the memory goes into visuals > > >> and sounds and helper texts and other massive data like that. Couple of bytes > > >> for vtables of object that manages such data are usually not worth talking > > >> about. > > >> > > >> OTOH on the 8-bit controllers where you really count bytes you do not have > > >> much need for such large class hierarchy anyway. > > > > > > Ok, so its a problem suffered from all the actual programming > > > languages, I think it could be a kind of limitation to obtain so big > > > objects, but its so. > > > Maybe the way to impose some kind of properties or functions to a > > > class without the vpointers replication penalty is the concepts > > > extension of C++11. > > > > > > Regards, > > > > > I think the above is not accurate. C++ code does suffer performance penalties > > from using multiple inheritance. Moreover, and what's especially frustrating, > > even the code that does not use multiple inheritance (in fact, any code using > > virtual functions) suffers from at least one performance penalty imposed by the > > way C++ supports multiple inheritance: the necessity to read the offset of the > > call target within the object of the most-derived class overriding the virtual > > method and subtracting this offset from the passed pointer to let the virtual > > function implementation access to the object it expects. > > > > Languages with single inheritance can assign a single offset from the start of > > the virtual table of the most-derived class of an object to the start of the > > slice of that class' virtual table correspondent to the virtual table of any of > > its bases. This effectively means that any class in such a language can have a > > single virtual table and the objects of the most derived class and the > > correspondent objects of all its base classes can have a single address. > > > > Complicating C++ specification by introducing special kind of classes that would > > be forbidden from being used as base classes in multiple inheritance (or,even > > more complex but more rewarding as well -- the classes that cannot be used as > > other-than-first base classes in multiple inheritance) could eliminate this > > penalty for the language users who do not use multiple inheritance. > > > > C++ does not have a chance of assigning any virtual function once definedin a > > class a single offset in a virtual table; therefore, it has to have multiple > > virtual tables. As it is, that is without the complication mentioned above, C++ > > can not let its compiler know at a virtual call site that the call is on the > > object that is the first base of the most-derived class; hence the necessity to > > always read and apply the offset at run-time. > > > > Languages with single inheritance and interfaces still have same performance > > advantages as languages without interfaces for the classes that do not implement > > interfaces (that is, they live to the promise of not imposing cost of a feature > > on the code that does not use it better than C++ does). For a class that does > > implement interfaces, the implementations have a choice of either avoiding space > > cost but making calls by interface significantly more expensive or adding > > pointers to individual "interface virtual tables" to the layout of an object of > > such a class and having calls by an interface only one indirection more > > expensive than regular virtual calls. I believe Java 1.0 took the first path and > > Java 1.1 and all its further versions took the second. > > > > -Pavel I think testing equivalent source programs in objetive c and c++ under the same os can tell the differences. linux or bsd can be |
|
|
|
|
|||
|
|||
| 88888 Dihedral |
|
Richard Damon
Guest
Posts: n/a
|
On 10/10/12 1:36 AM, Pavel wrote:
>> > True but thunk approach comes at higher cost for calling at non-first > base of extra jump as compared with 'classic' approach. Extra jump in > chunk is equivalent to at least one extra memory read (only to > instruction instead of data cache) and some instruction decoding. The > approach also somewhat increases memory usage with the second virtual > table and thunks. But you are right about zero-cost if calls by > non-first base are not used -- I completely forgot about thunk approach. > > -Pavel Actually, most of the jumps can be removed, by placing the thunk directly in front of the code for the subroutine. In fact, if we define that when calling a virtual function, the this pointer is always set to point to the sub object of the type where the function was first declared as virtual, then the function receiving the call knows exactly how to adjust the this pointer to point to the now full object (of the type the function belongs to). and can place that adjustment as a thunk just before the normal code for the function if we want to allow it to be called "normally" in a non-virtual manner without needed to pre-adjust the this pointer for those calls. The only case where these adjustments become non-trivial would be in cases of virtual inheritance where the pointer adjustment will need to do a look up rather than a simple offset. The simple procedure here is: When calling a virtual function that was first defined in a non-first base class (or a non-virtual function that is last defined in a non-first base class), before making the call (this will be a simple addition to the base pointer if we have non-virtual inheritance). At the function side, if the function is a virtual function of a type that has the initial definition in a non-first base class, include code to adjust the this pointer from that base class to the actual class just before the normal entry point, and have the vtable point there. The regular entry point can be used for non-virtual calls, with the caller passing the "normal" this pointer. This is probably minimal overhead (works best if the this pointer is passed in a register so the address offset is quick and easy). The only case where you get "extra" adjustments is if you make the call through a pointer to a class derived from one is non-first base derived from the class that first defined the virtual function to a function defined in class similarly after the multiple derivation, as you will adjust to the base and then back. |
|
|
|
|
|||
|
|||
| Richard Damon |
|
Pavel
Guest
Posts: n/a
|
Stuart wrote:
> Stuart wrote: >>> If the compiler wants to do anything with a class, it needs to know >>> the complete >>> definition of the class and all its base classes. > > On 10/11/12 Pavel wrote: >> No, it does not need to know "the complete definition". > > You got me there. I meant to write complete declaration. Sorry. > >> I am not sure >> about "anything", but specifically for generating code that calls a >> virtual function all the compiler needs to know is how to calculate the >> address of a particular subobject of the object of the most-derived >> class pointed by the pointer given to compiler. > > This is an implementation detail of the compiler. I doubt it. I think the necessity to know the address of a particular subobject follows from the necessity for the virtual functions to be able to access the members of that particular sub-object (e.g. its data members); therefor this necessity is hardly an implementation detail but a generic corollary from the language definition. "Thunking" compilers do not > have to do any additional work when they invoke a virtual function: > Base* ptr = ...; > ptr->foo (); > > will result in the following (pseudo-) ASM: > push ptr; > call *[ptr + &foo] this works as long as *[ptr + &foo] evaluates the different address for every different ptr at which foo() can be called for the object of the class (and there can be plenty of these for a single definition of foo()). Which leaves the implementation with choice to either use thunks with jumps (here is your extra work) for all but one such different addresses or have multiple foo()s with the ptr adjustment code inlined in each but one of them (there is no extra jump but the duplication of the whole code of foo, again for every different ptr at which the foo() can be called for the object of the class. > > where &foo gives the slot of the vtable where foo's address is stored. In order > to be able to generate such code, the compiler only has to know the complete > declaration of Base and Base's base classes (or else it could not figure out the > slot number of foo), even if ptr is initialized with an instance of the hitherto > unknown class Derived. agree. > > [snip] >> [...] I >> admit the "thunk" way avoids direct penalties for calling virtual by >> pointers on first base). > > Here we go. If I may cite you from up-thread: > > On 10/7/12 Pavel wrote: > > I think the above is not accurate. C++ code does suffer performance > > penalties from using multiple inheritance. Moreover, and what's > > especially frustrating, even the code that does not use multiple > > inheritance (in fact, any code using virtual functions) suffers from > at > least one performance penalty imposed by the way C++ supports > > multiple inheritance: the necessity to read the offset of the call > > target [...] > > Apparently you proved yourself wrong: I have already admitted that the above statement was not accurate; I forgot about the thunk approach; but not every implementation chooses it, and for reason: as we saw above, thunks penalize calls by non-first bases by an extra jump. As for the alternative of copying the entire function, it would penalize the program by unconditionally bloating the code size. It is not an inherent limitation of the > C++ language that it _must_ be slower for single-inheritance hierarchies because > it also supports multiple-inheritance. I think it would be more correct to say > that C++ can be as efficient as any single-inheritance vtable-based language, > provided the implementation of the C++ compiler is good (IOW, uses thunks). As I hope I have shown above, the thunk approach has its own penalties as compared to the direct pointer adjustment at the call site; thus I would not unconditionally qualify it as "good". Different programs may perform better or worse with this approach than with the direct adjustment. > > Regards, > Stuart -Pavel |
|
|
|
|
|||
|
|||
| Pavel |
|
Pavel
Guest
Posts: n/a
|
Richard Damon wrote:
> On 10/10/12 1:36 AM, Pavel wrote: >>> >> True but thunk approach comes at higher cost for calling at non-first >> base of extra jump as compared with 'classic' approach. Extra jump in >> chunk is equivalent to at least one extra memory read (only to >> instruction instead of data cache) and some instruction decoding. The >> approach also somewhat increases memory usage with the second virtual >> table and thunks. But you are right about zero-cost if calls by >> non-first base are not used -- I completely forgot about thunk approach. >> >> -Pavel > > Actually, most of the jumps can be removed, by placing the thunk > directly in front of the code for the subroutine. Not unless you generate multiple copies of the subroutine because there can be more than one offset for the same function (D::f() can be called by either of its the bases that declare virtual f() and there can be more than one such base, some inheriting from the other). One problem with generating multiple functions is that one function can be relatively long (in code terms) and *all of them has to be generated* regardless of whether even one of them is used (as opposed to the code-bloat caused by templates that only generates really used functions). For example, if you have this hierarchy: class B { public: virtual int f(); }; class B1 {...}; class B2 {...}; class B3 {...}; class D1: public B1, public B {...}; // no f() override! class D2: public B3, public D1 {...}; // no f() override! class D3: public B2, public D2 {... virtual int f(); };, you will have to generate 4 virtual tables and 4 different functions fully copying the body of function D3::f(): to call it by B*, D1*, D2* and D3*; of these, the former will not have adjustment and the latter 3 will. > > In fact, if we define that when calling a virtual function, the this > pointer is always set to point to the sub object of the type where the > function was first declared as virtual, How can you define 'this' to be always set this way? This same pointer can be used for other purposes, too (for example, for accessing non-virtual members of an object of its 'compile-time' type. It *has* to point to a particular sub-object (which one, in general depends on the way how it was created; but if a class is not derived from the same base class more than once, it has to be to that only sub-object that is of the class matching the pointer type). then the function receiving the > call knows exactly how to adjust the this pointer to point to the now > full object (of the type the function belongs to). and can place that > adjustment as a thunk just before the normal code for the function if we > want to allow it to be called "normally" in a non-virtual manner without > needed to pre-adjust the this pointer for those calls. > > The only case where these adjustments become non-trivial would be in > cases of virtual inheritance where the pointer adjustment will need to > do a look up rather than a simple offset. > > The simple procedure here is: > > When calling a virtual function that was first defined in a non-first > base class (or a non-virtual function that is last defined in a > non-first base class), before making the call (this will be a simple > addition to the base pointer if we have non-virtual inheritance). This does not seem to be complete statement and seems to be talking about two different things with two different costs: - To call a non-virtual function on a non-first base the function of the base class or its ancestor will be called. If it is an ancestor and its offset in the "compile-time" class of the pointer is not zero, you might need to offset the pointer but that offset is known in the compile-time so it is less of an issue (no extra memory read or a jump) - To call a virtual function by a pointer, compiler *cannot tell* whether the pointer's compile time is the first base or not (and even whether it is the "proper" base or the most-derived class itself). This information will only be known at run-time and can differ at that same call site from one call to another. "Thunk" approach only works because it, too, uses the run-time value of the pointer; compiler cannot generate any addition "before making the call" unless the generated code contains the memory read from the area (obviously, indirectly) pointed to by the pointer by which the call is. > > At the function side, if the function is a virtual function of a type > that has the initial definition in a non-first base class, include code > to adjust the this pointer from that base class to the actual class just > before the normal entry point, and have the vtable point there. Yes, but without jump it creates the whole new function for every such base whether it (the function) is going to be called by that base anywhere at all or not. I think that's why thunks with jumps are used in practice but I am unsure about the multiple versions of the functions (for trivial functions, it may not be bad idea though; but I expect complications with the name and cross-compiler linking. I am not sure if ABI supports such things though (simply don't know); otherwise, each compiler will name its "additional" virtual functions for the same 'user-visible' name differently). > The > regular entry point can be used for non-virtual calls, with the caller > passing the "normal" this pointer. true, but probably besides the point. > > This is probably minimal overhead It probably is -- in terms of instructions involved; but code bloating seems to be nasty; and it does lead to measurable performance penalty for sizable programs. (works best if the this pointer is > passed in a register so the address offset is quick and easy). The only > case where you get "extra" adjustments is if you make the call through a > pointer to a class derived from one is non-first base derived from the > class that first defined the virtual function to a function defined in > class similarly after the multiple derivation, as you will adjust to the > base and then back. > > -Pavel |
|
|
|
|
|||
|
|||
| Pavel |
|
Stuart
Guest
Posts: n/a
|
On 10/12/12 Pavel wrote:
[snip] > As I hope I have shown above, the thunk approach has its own penalties > as compared to the direct pointer adjustment at the call site; thus I > would not unconditionally qualify it as "good". Different programs may > perform better or worse with this approach than with the direct adjustment. Agreed. So this is the lesson to be learned: If one uses a single-inheritance hierarchy, it is wizer to choose a C++ compiler that does do thunking. Likewise, if one uses multiple inheritance a lot, one should rather use a call-site adjustment compiler (or ensure that the most used interface/base class is the first base class). Of course, one only has to care about this if the slow-down is really due to the additional thunking code or due to the pointer adjustment. Most of the time, this will be the least problem. Regards, Stuart PS: This thread turned out to be more interesting than I had initially thought. I have never heard about the pointer-adjustment technique before. Thanks for sharing, Pavel. |
|
|
|
|
|||
|
|||
| Stuart |
|
Richard Damon
Guest
Posts: n/a
|
On 10/12/12 1:39 AM, Pavel wrote:
>> Actually, most of the jumps can be removed, by placing the thunk >> directly in front of the code for the subroutine. > Not unless you generate multiple copies of the subroutine because there > can be more than one offset for the same function (D::f() can be called > by either of its the bases that declare virtual f() and there can be > more than one such base, some inheriting from the other). > > One problem with generating multiple functions is that one function can > be relatively long (in code terms) and *all of them has to be generated* > regardless of whether even one of them is used (as opposed to the > code-bloat caused by templates that only generates really used > functions). For example, if you have this hierarchy: class B { public: > virtual int f(); }; > class B1 {...}; class B2 {...}; class B3 {...}; > class D1: public B1, public B {...}; // no f() override! > class D2: public B3, public D1 {...}; // no f() override! > class D3: public B2, public D2 {... virtual int f(); };, > > you will have to generate 4 virtual tables and 4 different functions > fully copying the body of function D3::f(): to call it by B*, D1*, D2* > and D3*; of these, the former will not have adjustment and the latter 3 > will. > As I have been thinking about it, supporting pointer to member functions basically will require any implementation to have a virtual table for every distinct base class (That defines virtual functions) excepting single derivation. So the the 4 virtual tables are basically required. There does NOT need to be 4 different functions though. There may need to be 4 entry points, but the whole body doesn't need to be duplicated, the entries which need to adjust the this pointer, can adjust the pointer parameter and then fall into/jump into the main subroutine. You may not be able to write the equivalent to this in C++, but the complier doesn't need to. If you were to attempt to write it is C++ what you would need to do is something like: D3::f(D* d) { ... } // This parameter being shown explisitly D3::f(B* b) { return f(static_cast<D*>(b)); } and then let the compiler use tail recursion like elimination to replace the call with a jump. The next step is to realize that we can eliminate the jump by placing the code directly in from of the original function. In the case of multiple layers, they can be stacked B* -> D1* -> D2* -> D3* to eliminate the jumps, or the compiler can decide when the cost of the additional conversion exceeds the cost of the jump, and just add the jump. Thus the cost will never be higher than an adjustment followed by a jump, will be 0 cost if no adjustment is needed, and only the cost of a single adjustment if called from a single level of non-first base class (no jump needed). For many systems, if we are careful with are ABI design, so that the this pointer is passed via a register, the adjustment can be as simple as a single instruction in the basic case, a add or subtract immediate. Thus a possible layout of the entry to the function would be the equivalent to D3::f__thunk_B: this <- this - offsetof B in D1 D3::f__thunk_D1: this <- this - offsetof D1 in D2 D3::f__thunk_D2: this <- this - offsetof D2 in D3 D3::f start of normal code for D3::f If a jump is cheaper than 2 subtracts then it D3::f__thunk_B: this <- this - offsetof B in D3 jump B3::f D3::f__thunk_D1: this <- this - offsetof D1 in D2 D3::f__thunk_D2: this <- this - offsetof D2 in D3 D3::f start of normal code for D3::f This makes the thunking cost for a virtual function on par with the fixed adjustment cost for calling any non-first base member function, with at most the addition of a jump instruction (and this additional cost ONLY occurs if there are multiple non-first base derivations happening). > >> >> In fact, if we define that when calling a virtual function, the this >> pointer is always set to point to the sub object of the type where the >> function was first declared as virtual, > How can you define 'this' to be always set this way? This same pointer > can be used for other purposes, too (for example, for accessing > non-virtual members of an object of its 'compile-time' type. It *has* to > point to a particular sub-object (which one, in general depends on the > way how it was created; but if a class is not derived from the same base > class more than once, it has to be to that only sub-object that is of > the class matching the pointer type). > The key feature is that if you call using a pointer in the "B" sub-objects vtable, than you need to call it with a B* pointer as this. If the function is actually a D3 member function, then IT will adjust the this pointer as described above. For a direct virtual call (not via a member-function pointer), the compiler can choose the virtual table for the most derived class that it knows about that defines an override. If the object isn't of a more derived class that defines an override (say D4), then the code will just execute the adjustment at the call site and go directly into the function. If there is a subsequent override, then the call will go into the above thunk entry code, correct the this pointer. The compiler also has the option of using the most derived class's vtable and not adjust the pointer, knowing that this will go into a thunk that will adjust the pointer. If for example we call a g() that was declared virtual in B and D2, it will need to go into a thunk that adds the offset of D2 in D3 and then jump into D2::g(). This is a trade off of gaining space (using the common thunk, instead of adjusting at each call) at the expense of time (the cost of a jump). If jumps are really expensive, it may be possible with proper linker instruction to build the preamble for g to be: D2::g__thunk_D3: this <- this + offsetof B in D3 D2::g__thunk_B: this <- this - offsetof B in D1 D2::g__thunk_D1: this <- this - offsetof D1 in D2 D2::g start of normal code for D2::g (and further add later thunks in front until the cost of the extra adjustments exceed the cost of the jump). Note again, the main body of the member function has the this pointer pointing to the (sub)object of the type of the member function, the possible different type of this is only for the input to the thunk, which effectively casts the pointer to the needed type. > then the function receiving the >> call knows exactly how to adjust the this pointer to point to the now >> full object (of the type the function belongs to). and can place that >> adjustment as a thunk just before the normal code for the function if we >> want to allow it to be called "normally" in a non-virtual manner without >> needed to pre-adjust the this pointer for those calls. > >> >> The only case where these adjustments become non-trivial would be in >> cases of virtual inheritance where the pointer adjustment will need to >> do a look up rather than a simple offset. >> >> The simple procedure here is: >> >> When calling a virtual function that was first defined in a non-first >> base class (or a non-virtual function that is last defined in a >> non-first base class), before making the call (this will be a simple >> addition to the base pointer if we have non-virtual inheritance). > This does not seem to be complete statement and seems to be talking > about two different things with two different costs: > > - To call a non-virtual function on a non-first base the function of the > base class or its ancestor will be called. If it is an ancestor and its > offset in the "compile-time" class of the pointer is not zero, you might > need to offset the pointer but that offset is known in the compile-time > so it is less of an issue (no extra memory read or a jump) > > - To call a virtual function by a pointer, compiler *cannot tell* > whether the pointer's compile time is the first base or not (and even > whether it is the "proper" base or the most-derived class itself). This > information will only be known at run-time and can differ at that same > call site from one call to another. "Thunk" approach only works because > it, too, uses the run-time value of the pointer; compiler cannot > generate any addition "before making the call" unless the generated code > contains the memory read from the area (obviously, indirectly) pointed > to by the pointer by which the call is. Perhaps I wasn't as clear on what I was describing here. In the case of a call via a pointer to object -> name of member function call, the compiler knows a lot of the layout, and can use this to choose how it wants to code the call among its options. It can validly choose any of the vtables as long as it uses the proper object base for the call. Since with the structure I have described, "down casting" thunks are more expensive, it might make sense to down cast at the call site and then make a call that, at worse, will only need to up cast. > >> >> At the function side, if the function is a virtual function of a type >> that has the initial definition in a non-first base class, include code >> to adjust the this pointer from that base class to the actual class just >> before the normal entry point, and have the vtable point there. > Yes, but without jump it creates the whole new function for every such > base whether it (the function) is going to be called by that base > anywhere at all or not. I think that's why thunks with jumps are used in > practice but I am unsure about the multiple versions of the functions > (for trivial functions, it may not be bad idea though; but I expect > complications with the name and cross-compiler linking. I am not sure if > ABI supports such things though (simply don't know); otherwise, each > compiler will name its "additional" virtual functions for the same > 'user-visible' name differently). As I have pointed out, these entries do NOT need "whole new functions" but a bit of adjusting code at the entry point to make the call with the "wrong" type of this now correct. From the call side, they may act like different functions, since they all called with different types of parameters, but the body (after the thunk) is likely to all be the same (not just identical looking code, but being the same code at the same addresses). Of course, if the function is so simple that the code for a version with the different base is "better" than doing the adjustment, the compiler if free to make them separate functions. > > >> The >> regular entry point can be used for non-virtual calls, with the caller >> passing the "normal" this pointer. > true, but probably besides the point. > >> >> This is probably minimal overhead > It probably is -- in terms of instructions involved; but code bloating > seems to be nasty; and it does lead to measurable performance penalty > for sizable programs. It seems to be minimal, and only imposed when needed. As opposed to the other method presented which places in the vtable an offset FOR ALL FUNCTIONS and the offset is applied AT THE CALL SITE for ALL CALLS. This will add more space and time for most programs (since it needs the offset even for 1st base entries, and adds code to every call) as opposed to only when needed. > > (works best if the this pointer is >> passed in a register so the address offset is quick and easy). The only >> case where you get "extra" adjustments is if you make the call through a >> pointer to a class derived from one is non-first base derived from the >> class that first defined the virtual function to a function defined in >> class similarly after the multiple derivation, as you will adjust to the >> base and then back. >> >> > > -Pavel |
|
|
|
|
|||
|
|||
| Richard Damon |
|
Pavel
Guest
Posts: n/a
|
Richard Damon wrote:
> On 10/12/12 1:39 AM, Pavel wrote: >>> Actually, most of the jumps can be removed, by placing the thunk >>> directly in front of the code for the subroutine. >> Not unless you generate multiple copies of the subroutine because there >> can be more than one offset for the same function (D::f() can be called >> by either of its the bases that declare virtual f() and there can be >> more than one such base, some inheriting from the other). >> >> One problem with generating multiple functions is that one function can >> be relatively long (in code terms) and *all of them has to be generated* >> regardless of whether even one of them is used (as opposed to the >> code-bloat caused by templates that only generates really used >> functions). For example, if you have this hierarchy: class B { public: >> virtual int f(); }; >> class B1 {...}; class B2 {...}; class B3 {...}; >> class D1: public B1, public B {...}; // no f() override! >> class D2: public B3, public D1 {...}; // no f() override! >> class D3: public B2, public D2 {... virtual int f(); };, >> >> you will have to generate 4 virtual tables and 4 different functions >> fully copying the body of function D3::f(): to call it by B*, D1*, D2* >> and D3*; of these, the former will not have adjustment and the latter 3 >> will. >> > > As I have been thinking about it, supporting pointer to member functions > basically will require any implementation to have a virtual table for > every distinct base class (That defines virtual functions) excepting > single derivation. So the the 4 virtual tables are basically required. > > There does NOT need to be 4 different functions though. There may need > to be 4 entry points, but the whole body doesn't need to be duplicated, > the entries which need to adjust the this pointer, can adjust the > pointer parameter and then fall into/jump into the main subroutine. You > may not be able to write the equivalent to this in C++, but the complier > doesn't need to. If you were to attempt to write it is C++ what you > would need to do is something like: > > D3::f(D* d) { ... } // This parameter being shown explisitly > > D3::f(B* b) { return f(static_cast<D*>(b)); } > > and then let the compiler use tail recursion like elimination to replace > the call with a jump. > > The next step is to realize that we can eliminate the jump by placing > the code directly in from of the original function. In the case of > multiple layers, they can be stacked B* -> D1* -> D2* -> D3* to > eliminate the jumps, or the compiler can decide when the cost of the > additional conversion exceeds the cost of the jump, and just add the > jump. Thus the cost will never be higher than an adjustment followed by > a jump, will be 0 cost if no adjustment is needed, and only the cost of > a single adjustment if called from a single level of non-first base > class (no jump needed). > > For many systems, if we are careful with are ABI design, so that the > this pointer is passed via a register, the adjustment can be as simple > as a single instruction in the basic case, a add or subtract immediate. > > Thus a possible layout of the entry to the function would be the > equivalent to > > D3::f__thunk_B: this <- this - offsetof B in D1 > D3::f__thunk_D1: this <- this - offsetof D1 in D2 > D3::f__thunk_D2: this <- this - offsetof D2 in D3 > D3::f start of normal code for D3::f > > If a jump is cheaper than 2 subtracts then it > D3::f__thunk_B: this <- this - offsetof B in D3 > jump B3::f > D3::f__thunk_D1: this <- this - offsetof D1 in D2 > D3::f__thunk_D2: this <- this - offsetof D2 in D3 > D3::f start of normal code for D3::f > > This makes the thunking cost for a virtual function on par with the > fixed adjustment cost for calling any non-first base member function, > with at most the addition of a jump instruction (and this additional > cost ONLY occurs if there are multiple non-first base derivations > happening). Agree. This is good design. >> >>> >>> In fact, if we define that when calling a virtual function, the this >>> pointer is always set to point to the sub object of the type where the >>> function was first declared as virtual, >> How can you define 'this' to be always set this way? This same pointer >> can be used for other purposes, too (for example, for accessing >> non-virtual members of an object of its 'compile-time' type. It *has* to >> point to a particular sub-object (which one, in general depends on the >> way how it was created; but if a class is not derived from the same base >> class more than once, it has to be to that only sub-object that is of >> the class matching the pointer type). >> > The key feature is that if you call using a pointer in the "B" > sub-objects vtable, than you need to call it with a B* pointer as this. > If the function is actually a D3 member function, then IT will adjust > the this pointer as described above. For a direct virtual call (not via > a member-function pointer), the compiler can choose the virtual table > for the most derived class that it knows about that defines an override. > If the object isn't of a more derived class that defines an override > (say D4), then the code will just execute the adjustment at the call > site and go directly into the function. If there is a subsequent > override, then the call will go into the above thunk entry code, correct > the this pointer. The compiler also has the option of using the most > derived class's vtable and not adjust the pointer, knowing that this > will go into a thunk that will adjust the pointer. If for example we > call a g() that was declared virtual in B and D2, it will need to go > into a thunk that adds the offset of D2 in D3 and then jump into > D2::g(). This is a trade off of gaining space (using the common thunk, > instead of adjusting at each call) at the expense of time (the cost of a > jump). If jumps are really expensive, it may be possible with proper > linker instruction to build the preamble for g to be: > > D2::g__thunk_D3: this <- this + offsetof B in D3 > D2::g__thunk_B: this <- this - offsetof B in D1 > D2::g__thunk_D1: this <- this - offsetof D1 in D2 > D2::g start of normal code for D2::g > > (and further add later thunks in front until the cost of the extra > adjustments exceed the cost of the jump). > > Note again, the main body of the member function has the this pointer > pointing to the (sub)object of the type of the member function, the > possible different type of this is only for the input to the thunk, > which effectively casts the pointer to the needed type. > > >> then the function receiving the >>> call knows exactly how to adjust the this pointer to point to the now >>> full object (of the type the function belongs to). and can place that >>> adjustment as a thunk just before the normal code for the function if we >>> want to allow it to be called "normally" in a non-virtual manner without >>> needed to pre-adjust the this pointer for those calls. >> >>> >>> The only case where these adjustments become non-trivial would be in >>> cases of virtual inheritance where the pointer adjustment will need to >>> do a look up rather than a simple offset. >>> >>> The simple procedure here is: >>> >>> When calling a virtual function that was first defined in a non-first >>> base class (or a non-virtual function that is last defined in a >>> non-first base class), before making the call (this will be a simple >>> addition to the base pointer if we have non-virtual inheritance). >> This does not seem to be complete statement and seems to be talking >> about two different things with two different costs: >> >> - To call a non-virtual function on a non-first base the function of the >> base class or its ancestor will be called. If it is an ancestor and its >> offset in the "compile-time" class of the pointer is not zero, you might >> need to offset the pointer but that offset is known in the compile-time >> so it is less of an issue (no extra memory read or a jump) >> >> - To call a virtual function by a pointer, compiler *cannot tell* >> whether the pointer's compile time is the first base or not (and even >> whether it is the "proper" base or the most-derived class itself). This >> information will only be known at run-time and can differ at that same >> call site from one call to another. "Thunk" approach only works because >> it, too, uses the run-time value of the pointer; compiler cannot >> generate any addition "before making the call" unless the generated code >> contains the memory read from the area (obviously, indirectly) pointed >> to by the pointer by which the call is. > > Perhaps I wasn't as clear on what I was describing here. In the case of > a call via a pointer to object -> name of member function call, the > compiler knows a lot of the layout, and can use this to choose how it > wants to code the call among its options. It can validly choose any of > the vtables as long as it uses the proper object base for the call. > Since with the structure I have described, "down casting" thunks are > more expensive, it might make sense to down cast at the call site and > then make a call that, at worse, will only need to up cast. >> >>> >>> At the function side, if the function is a virtual function of a type >>> that has the initial definition in a non-first base class, include code >>> to adjust the this pointer from that base class to the actual class just >>> before the normal entry point, and have the vtable point there. >> Yes, but without jump it creates the whole new function for every such >> base whether it (the function) is going to be called by that base >> anywhere at all or not. I think that's why thunks with jumps are used in >> practice but I am unsure about the multiple versions of the functions >> (for trivial functions, it may not be bad idea though; but I expect >> complications with the name and cross-compiler linking. I am not sure if >> ABI supports such things though (simply don't know); otherwise, each >> compiler will name its "additional" virtual functions for the same >> 'user-visible' name differently). > > As I have pointed out, these entries do NOT need "whole new functions" > but a bit of adjusting code at the entry point to make the call with the > "wrong" type of this now correct. From the call side, they may act like > different functions, since they all called with different types of > parameters, but the body (after the thunk) is likely to all be the same > (not just identical looking code, but being the same code at the same > addresses). Of course, if the function is so simple that the code for a > version with the different base is "better" than doing the adjustment, > the compiler if free to make them separate functions. > >> >> >>> The >>> regular entry point can be used for non-virtual calls, with the caller >>> passing the "normal" this pointer. >> true, but probably besides the point. >> >>> >>> This is probably minimal overhead >> It probably is -- in terms of instructions involved; but code bloating >> seems to be nasty; and it does lead to measurable performance penalty >> for sizable programs. > It seems to be minimal, and only imposed when needed. As opposed to the > other method presented which places in the vtable an offset FOR ALL > FUNCTIONS and the offset is applied AT THE CALL SITE for ALL CALLS. > > This will add more space and time for most programs (since it needs the > offset even for 1st base entries, and adds code to every call) as > opposed to only when needed. > >> >> (works best if the this pointer is >>> passed in a register so the address offset is quick and easy). The only >>> case where you get "extra" adjustments is if you make the call through a >>> pointer to a class derived from one is non-first base derived from the >>> class that first defined the virtual function to a function defined in >>> class similarly after the multiple derivation, as you will adjust to the >>> base and then back. >>> >>> >> >> -Pavel > -Pavel |
|
|
|
|
|||
|
|||
| Pavel |
|
Pavel
Guest
Posts: n/a
|
Stuart wrote:
> On 10/12/12 Pavel wrote: > [snip] >> As I hope I have shown above, the thunk approach has its own penalties >> as compared to the direct pointer adjustment at the call site; thus I >> would not unconditionally qualify it as "good". Different programs may >> perform better or worse with this approach than with the direct adjustment. > > Agreed. So this is the lesson to be learned: If one uses a single-inheritance > hierarchy, it is wizer to choose a C++ compiler that does do thunking. Likewise, > if one uses multiple inheritance a lot, one should rather use a call-site > adjustment compiler (or ensure that the most used interface/base class is the > first base class). > > Of course, one only has to care about this if the slow-down is really due to the > additional thunking code or due to the pointer adjustment. Most of the time, > this will be the least problem. > > Regards, > Stuart > > PS: This thread turned out to be more interesting than I had initially thought. > I have never heard about the pointer-adjustment technique before. Thanks for > sharing, Pavel. > I think the really useful person on this thread was Richard. See his last post -- it is a good design and seems to be best of two worlds. The code generator will be able to select the most optimal thunk; unless the depth is too high, it probably will be the non-jumping one (adding a constant to a register does not affect CPU pipelines and imposes only absolutely necessary cache traffic). I guess I lost my point but I do not feel bad -- it *was* productive. -Pavel |
|
|
|
|
|||
|
|||
| Pavel |
|
|
|
| |
![]() |
| Thread Tools | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| inheritance, multiple inheritance and the weaklist and instance dictionaries | Rouslan Korneychuk | Python | 8 | 02-10-2011 04:02 AM |
| Interface inheritance vs Implementation inheritance. | Daniel Pitts | Java | 27 | 02-27-2008 01:37 AM |
| Multiple inheritance/interface delegate through template function not working | roman.blackhammer@gmail.com | C++ | 7 | 07-03-2007 09:00 AM |
| Multiple inheritance: Interface problem workaround, please comment this | Axel Straschil | Python | 6 | 04-11-2005 08:14 AM |
| Private access modifier and Inheritance (Inheritance implementation in Java) | maxw_cc | Java | 1 | 12-21-2003 11:38 AM |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc..
SEO by vBSEO ©2010, Crawlability, Inc. |




