![]() |
Pattern for allocating array objects with embedded header?
From time to time I could use objects that consist of a header and an
array of items of immutable size. Doing this with two allocations is straight forward, but has the drawback of another indirection on each array member access and, of course, the additional allocation. So the basic idea is to allocate the storage at once. Nothing special to a C programmer. But wrapping this in a C++ class seems a little bit tricky to me. I use a simple string as an example. The string length is the header in this case, the characters are the items. But I have other similar use cases with more complex headers and items, like the nodes o a B-tree. #include<stdlib.h> #include<string.h> #include<stdio.h> class String {private: size_t Len; // Logically const, but since we initialize it // before constructor it has to be non-const. private: // pure reference type => non-copyable String(const String&); void operator=(const String&); private: // Avoid array creation and ordinary new operator! static void* operator new(size_t); static void* operator new[](size_t); static void operator delete[](void*); private: // allocators static void* operator new(size_t s, size_t l) { String* that = (String*)new char[s+l+1]; that->Len = l; // Dirty early construction return that; } public: static void operator delete(void* p) { delete[] (char*)p; } private: String(const char* str) { memcpy(this+1, str, Len); ((char*)(this+1))[Len] = 0; } public: static String* create(const char* str, size_t len) { return new (len) String(str); } /// Return the current strings length. size_t length() const { return Len; } /// Return the current strings content. const char* str() const { return (char*)(this+1); } }; int main(int argc, char** argv) { String* mystr = String::create(argv[0], strlen(argv[0])); printf("Len = %u, String = %s\n", mystr->length(), mystr->str()); delete mystr; return 0; } The basic requirement is that operator new and constructor is now one logical action. Are the common patterns to implement this? From my point of view the access to members from operator new is really dirty. And I am unsure whether the result of operator new is necessarily equivalent to this. Any better ideas? Marcel |
Re: Pattern for allocating array objects with embedded header?
On 26.12.2012 14:16, Paavo Helde wrote:
> Marcel Müller <news.5.maazl@spamgourmet.org> wrote in news:50dacf41$0 > $6637$9b4e6d93@newsspool2.arcor-online.net: > >> From time to time I could use objects that consist of a header and an >> array of items of immutable size. Doing this with two allocations is >> straight forward, but has the drawback of another indirection on each >> array member access and, of course, the additional allocation. > > Let's see: [...] > b) separate header and data (a la std::string), allocate header on stack > (presumably header is small) and the data portion dynamically on the > heap: > > std::string x("..."); > > Accessing the data: > > x.c_str(); [...] > So it seems to me that b) is generally not worse than a) and with small > string optimization b) can do much better than a) in some circumstances. > So I do not see any actual reason to prefer a) over b) here. In fact you say: use value types, they perform better. Maybe I should not have used a simple string as an example. Think of delivery documents with a reasonable large header. You do not want to copy this over and over. Furthermore think of the case of optional references. In my case I have in the order of 10,000 objects. Each of them has about 30 references to other objects. All of them are optional. They are used sparse (~20%) and they often refer to the same object. If each reference take let's say 10 machine size words instead of one for the reference only, we are talking about ~50 MB additional memory. The total memory of the application is restricted by the platform to about 250 MB. So this is not a good idea. For this reason I do not like the small string optimization very much. At least not for immutable strings. It also prevents deduplication. In another case I have a B-tree with nodes. Each node consist of a header and two dozen entries. Each entry is either a reference to some external object or a reference to a sub node. Putting the header of the sub nodes into the parent (together with the reference to it's entries) would increase each entry and with them the size of the B tree structures by about a factor 5. Compared to these two examples the overhead of an additional allocation for the header is low. But even better is one allocation (especially with respect to cache efficiency). Marcel |
Re: Pattern for allocating array objects with embedded header?
On Thursday, 27 December 2012 14:27:41 UTC+2, Marcel Müller wrote:
> On 26.12.2012 14:16, Paavo Helde wrote: > > Marcel Müller <news.5.maazl@spamgourmet.org> wrote in news:50dacf41$0 > > $6637$9b4e6d93@newsspool2.arcor-online.net: > > > >> From time to time I could use objects that consist of a header and an > >> array of items of immutable size. Doing this with two allocations is > >> straight forward, but has the drawback of another indirection on each > >> array member access and, of course, the additional allocation. > > > > Let's see: > [...] > > b) separate header and data (a la std::string), allocate header on stack > > (presumably header is small) and the data portion dynamically on the > > heap: > > > > std::string x("..."); > > > > Accessing the data: > > > > x.c_str(); > [...] > > So it seems to me that b) is generally not worse than a) and with small > > string optimization b) can do much better than a) in some circumstances.. > > So I do not see any actual reason to prefer a) over b) here. > > In fact you say: use value types, they perform better. std::string is not value type. It is a dynamic container of characters. > Maybe I should not have used a simple string as an example. Certainly. Look at our faces. Do you see anyone who is not already bored of all those QStrings CStrings wxStrings BSTRs and the like? Reinventing strings is sure way how to attune enthusiasm down. Notice the more likely usage example of your String class: String* mystr = NULL; try { mystr = String::create("bla bla", strlen("bla bla")); // ... // whatever C++ usage of mystr } catch(...) { delete mystr; throw; } delete mystr; That is to achieve *minimal* exception safety. std::string has thankfully removed the need for that crap at least. > Think of delivery documents with a reasonable large header. You do not > want to copy this over and over. There are basically 3 ways to pass data ahead: to copy, to pass a pointer/reference or to move. > Furthermore think of the case of optional references. > In my case I have in the order of 10,000 objects. Each of them has about > 30 references to other objects. All of them are optional. They are used > sparse (~20%) and they often refer to the same object. What are the "optional references"? A reference in C++ can not be optional. You mean some textual reference like in scientific publications ... or some sort of hyperlink ... or? > If each reference take let's say 10 machine size words instead of one > for the reference only, we are talking about ~50 MB additional memory. Uhh ... 50 MB / (10 * 30 * 10 000) ... ~ 16 bytes. You got a platform where "machine size word" is 16 bytes? sizeof(std::string) with short string optimization is anyway nowhere 160. It is typically 32 or less. > The total memory of the application is restricted by the platform to > about 250 MB. So this is not a good idea. So ... 128 bit platform ... with 250 MB memory? Very interesting. > For this reason I do not like the small string optimization very much. > At least not for immutable strings. It also prevents deduplication. Do not use strings then for your "references". Use pointers. Or smart pointers. Or ... if the references are very sparse then decouple them from your objects into separate container and form a graph. So "documents" of yours are vertexes and the "references" are edges. > In another case I have a B-tree with nodes. Each node consist of a > header and two dozen entries. Each entry is either a reference to some > external object or a reference to a sub node. Putting the header of the > sub nodes into the parent (together with the reference to it's entries) > would increase each entry and with them the size of the B tree > structures by about a factor 5. I calculate what I calculate but can no way get factor of 5, sorry. Maybe you use the term "reference" in too several different meanings here or I misunderstand something obvious. > Compared to these two examples the overhead of an additional allocation > for the header is low. But even better is one allocation (especially > with respect to cache efficiency). 10 000 allocations or 20 000? On typical platforms we are talking about difference around 5 milliseconds. Not sure about your particular cases but in general the allocations start to hurt when we have millions of those. |
| All times are GMT. The time now is 12:35 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.