Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Engineering a list container. Part 1.

Reply
Thread Tools

Engineering a list container. Part 1.

 
 
jacob navia
Guest
Posts: n/a
 
      12-07-2013
Lists
-----
Single linked lists use a single pointer to the next element. The data
for the element comes right behind that pointer to avoid the overhead
that yet another pointer would represent.

So, we use this structure:

typedef struct _list_element {
struct _list_element *Next;
char Data[MINIMUM_ARRAY_INDEX]; // See below
} list_element;

The list header uses this structure to store the elements. As you can
see, there is no space wasted in a pointer to the element stored. The
element stored is placed just behind the Next pointer. The downside of
this decision is that we can’t recycle this object to store other
different objects of different size.

The constant MINIMUM ARRAY INDEX is defined as 1 if we are compiling in
C90 mode or as nothing if we are compiling in C99 mode. In C99 mode we
have a flexible structure, that consists of a fixed and a variable part.
The fixed part is the pointer to the next element. The variable part is
the object we are storing in the list.

Alignment
---------
Some machines require that data be stored at particular addresses,
always a multiple of two. For instance SPARC machines require that
doubles be aligned at a multiple of 8. The structure for our list
element above would provoke a crash when used to store doubles 4.
In those machines the list element structure is defined as follows:

typedef struct _ListElement {
struct _ListElement *Next;
#ifdef SPARC32
void *alignment;
#endif
char Data[MINIMUM_ARRAY_INDEX];
} ListElement;

This assumes that sizeof(void *) is 4.
In machines that handle unaligned data gracefully without crashing
alignment re- quirements aren’t useless, since in most cases they
provoke a performance loss.

The list header
---------------

struct _List {
ListInterface *VTable;
size_t count;
unsigned Flags;
unsigned timestamp;
size_t ElementSize;
list_element *Last;
list_element *First;
CompareFunction Compare;
ErrorFunction RaiseError;
ContainerHeap *Heap;
ContainerAllocator *Allocator;
};
In the public containers.h header file we refer always to an abstract
structure List. We define it here. This schema allows other
implementation to use the same header with maybe radically different
implementations of their data structure.

The individual fields are as follows:

1.
1.1: Vtable. Pointer to a table of methods for this container
1.2: count. Number of elements in the list
1.3: Flags. Bit fields for flags about this list.
1.4: ElementSize. Number of bytes the data occupies.

2. timestamp. This field is incremented at each modification of the
list, and allows the iterators to detect if the container changes during
an iteration: they store the value of this field at the start of the
iteration, and before each iteration they compare it with its current
value. If there are any changes, they return NULL .

3. Last. Stores a pointer to the last element of the list. This allows
the addition of an element at the end of the list to be fast, avoiding a
complete rescan of the list. This field is an optimization, all
algorithms of a single linked list would work without this field.

4. First. The start of the linked list.

5. Compare. A comparison function for the type of elements stored in the
list.

6. RaiseError. A function that will be called when an error occurs. This
field is necessary only if you want to keep the flexibility of having a
different error function for each list that the client software builds.
An alternative implementation would store a pointer to an error function
in the interface.

7. Allocator. A set of functions that allocates memory for this list. In
an implementation that needs less flexibility and is more interested in
saving space it could be replaced by the default allocator.

The sample implementation has certainly a quite voluminous header
because of a design decision to keep things very flexible. Other
implementations could trim most of the fields, and an absolute minimal
implementation would trim Last, Compare, RaiseError, Heap,
and Allocator. If the implementation assumes that only one iterator per
container is allowed, the timestamp field could be replace by a single
bit (’changed’) in the Flags field.

The list interface
------------------
The list interface is a table of functions that holds all the methods
of the container. For the list container we have:

typedef struct tagListInterface {
int (*Add)(List *L,const void *newval);
int (*AddRange)(List *L, size_t n,const void *data);
void *(*Advance)(ListElement **pListElement);
int (*Append)(List *l1,List *l2);
int (*Apply)(List *L,int(Applyfn)(void *,void *),void *arg);
void *(*Back)(const List *l);
int (*Clear)(List *L);
int (*Contains)(const List *L,const void *element);
List *(*Copy)(const List *L);
int (*CopyElement)(const List *list,size_t idx,void *OutBuffer);
List *(*Create)(size_t element_size);
List *(*CreateWithAllocator)(size_t elementsize,
const ContainerAllocator *mm);
void *(*ElementData)(ListElement *le);
int (*Equal)(const List *l1,const List *l2);
int (*Erase)(List *L,const void *);
int (*EraseAll)(List *l,const void *);
int (*EraseAt)(List *L,size_t idx);
int (*EraseRange)(List *L,size_t start,size_t end);
int (*Finalize)(List *L);
ListElement *(*FirstElement)(List *l);
void *(*Front)(const List *l);
const ContainerAllocator *(*GetAllocator)(const List *list);
void *(*GetElement)(const List *L,size_t idx);
size_t (*GetElementSize)(const List *l);
unsigned (*GetFlags)(const List *L);
ContainerHeap *(*GetHeap)(const List *l);
List *(*GetRange)(const List *l,size_t start,size_t end);
int (*IndexOf)(const List *L,const void *SearchedElement,
void *ExtraArgs,size_t *result);
List *(*Init)(List *aList,size_t element_size);
int (*InitIterator)(List *L,void *buf);
List *(*InitWithAllocator)(List *aList,size_t element_size,
const ContainerAllocator *mm);
List *(*InitializeWith)(size_t elementSize,size_t n,
const void *data);
int (*InsertAt)(List *L,size_t idx,const void *newval);
int (*InsertIn)(List *l, size_t idx,List *newData);
ListElement *(*LastElement)(List *l);
List *(*Load)(FILE *stream, ReadFunction loadFn,void *arg);
Iterator *(*NewIterator)(List *L);
ListElement *(*NextElement)(ListElement *le);
int (*PopFront)(List *L,void *result);
int (*PushFront)(List *L,const void *str);
int (*RemoveRange)(List *l,size_t start, size_t end);
int (*ReplaceAt)(List *L,size_t idx,const void *newval);
int (*Reverse)(List *l);
int (*RotateLeft)(List *l, size_t n);
int (*RotateRight)(List *l,size_t n);
int (*Save)(const List *L,FILE *stream, SaveFunction saveFn,
void *arg);
int (*Select)(List *src,const Mask *m);
List *(*SelectCopy)(const List *src,const Mask *m);
CompareFunction (*SetCompareFunction)(List *l,CompareFunction fn);
DestructorFunction (*SetDestructor)(List *v,DestructorFunction fn);
int (*SetElementData)(List *l, ListElement *le,void *data);
ErrorFunction (*SetErrorFunction)(List *L,ErrorFunction);
unsigned (*SetFlags)(List *L,unsigned flags);
size_t (*Size)(const List *L);
size_t (*Sizeof)(const List *l);
size_t (*SizeofIterator)(const List *);
ListElement *(*Skip)(ListElement *l,size_t n);
int (*Sort)(List *l);
List *(*SplitAfter)(List *l, ListElement *pt);
int (*UseHeap)(List *L, const ContainerAllocator *m);
int (*deleteIterator)(Iterator *);
} ListInterface;

Note that ther is a single Vtable for all lists of the same type.
All list just a pointer to a common method table.

In a future post I will explain this functions in detail and propose
code for them.

Feel free to comment on this.

jacob
 
Reply With Quote
 
 
 
 
Malcolm McLean
Guest
Posts: n/a
 
      12-08-2013
On Saturday, December 7, 2013 11:42:28 PM UTC, jacob navia wrote:
>
> typedef struct _list_element {
>
> struct _list_element *Next;
>
> char Data[MINIMUM_ARRAY_INDEX]; // See below
>
> } list_element;
>


C handles this perfectly well as

typedef struct node
{
struct node *next;
int fielda;
char fieldb[100];
}

etc, but it's by no means easy to make it generic. As you say, if fielda is
a double, some systems will insert 4 bytes of padding between it and the next
pointer. No problem, unless you want a generic system.
>
> The sample implementation has certainly a quite voluminous header
> because of a design decision to keep things very flexible. Other
> implementations could trim most of the fields, and an absolute minimal
> implementation would trim Last, Compare, RaiseError, Heap,
> and Allocator.
> Feel free to comment on this.
>

OK, so keeping with our non-generic implementation, probably we'll be calling
malloc() to create new nodes, we'll have head pointer somewhere, we'll
frequently be iterating over the list with

for(ptr = head; ptr != 0; ptr = ptr->next)

How can we beat this for the generic? We can use a fast fixed block allocator.
We can implement bug-free insert / deletes at any position, whilst with the
simple non-generic implementation we can only insert at the head, and we've
got to remember not to let anyone hang on to the head pointer because it
becomes invalidated by an insert. We can automatically destroy the list.

What happens if the list itself changes during an iteration? Is it important
for the programmer to be able to toggle relatively easily between a list and
an array? Are out of memory conditions essentially impossible, or do we need
an elaborate system to handle them? With a flexible generic solution, you've
got to provide answers to all of those, and it has to be the more complex
answer. That's one snag. The generic can't be totally simple to use, because
there aren't simple ways of addressing these issues.

The other problem is the perennial one, dependency. You don't want to create
a dependency on another file, much less a library/compiler, just because you're
using a linked list to store your objects.
 
Reply With Quote
 
 
 
 
jacob navia
Guest
Posts: n/a
 
      12-08-2013
Le 08/12/2013 17:42, Malcolm McLean a écrit :
> On Saturday, December 7, 2013 11:42:28 PM UTC, jacob navia wrote:
>>
>> typedef struct _list_element {
>>
>> struct _list_element *Next;
>>
>> char Data[MINIMUM_ARRAY_INDEX]; // See below
>>
>> } list_element;
>>

>
> C handles this perfectly well as
>
> typedef struct node
> {
> struct node *next;
> int fielda;
> char fieldb[100];
> }
>
> etc, but it's by no means easy to make it generic. As you say, if fielda is
> a double, some systems will insert 4 bytes of padding between it and the next
> pointer. No problem, unless you want a generic system.


You have no problems with the generic system either. Just do

typedef struct myData {
int fielda;
char fieldb[100];
} Data;

You pass sizeof(Data) to the generic creation functions. No overhead at all.


>>
>> The sample implementation has certainly a quite voluminous header
>> because of a design decision to keep things very flexible. Other
>> implementations could trim most of the fields, and an absolute minimal
>> implementation would trim Last, Compare, RaiseError, Heap,
>> and Allocator.
>> Feel free to comment on this.
>>

> OK, so keeping with our non-generic implementation, probably we'll be calling
> malloc() to create new nodes, we'll have head pointer somewhere, we'll
> frequently be iterating over the list with
>
> for(ptr = head; ptr != 0; ptr = ptr->next)
>
> How can we beat this for the generic? We can use a fast fixed block allocator.


Yes, a fast fixed block allocator is provided by the ccl. No need to use
a slow malloc function. But of course the generic implementation
needs to do the same loop and will have the same speed as the
non-generic.

> We can implement bug-free insert / deletes at any position, whilst with the
> simple non-generic implementation we can only insert at the head, and we've
> got to remember not to let anyone hang on to the head pointer because it
> becomes invalidated by an insert.


No. The ccl provides with "FirstElement" and "NextElement" functions
that return the whole (including the "next" pointer), for you to hack
away with the same liberty as you would using your personal list
implementation.


We can automatically destroy the list.

Sorry I do not understand that. You mean your personal non-generic
implementation uses garbage collection? Or what?

>
> What happens if the list itself changes during an iteration?


If you use the provided iterators, the iteration stops.


> Is it important
> for the programmer to be able to toggle relatively easily between a list and
> an array?


Maybe, who knows. In any case it is very easy to do with the ccl.


> Are out of memory conditions essentially impossible, or do we need
> an elaborate system to handle them? With a flexible generic solution, you've
> got to provide answers to all of those, and it has to be the more complex
> answer. That's one snag.


Yes, in your own list implementation you just crash if there is no more
memory left because you assumed it will always be available. The ccl
doesn't do that but this provokes NO EXTRA COMPLEXITY in your code since
this only be an issue if an allocation fails!

This is no "snag" but a feature that allows you to program AS IF there
was always memory available but in case you do not crash but a user
supplied recovery function is called!

> The generic can't be totally simple to use, because
> there aren't simple ways of addressing these issues.
>


The error handling design is quite complex but at the same time very
easy to use. By default there is an error function that is called when
an error occurs. The ccl supplies a default function that prints the
error in stderr and exits the program. You can override that with your
own function of course.


> The other problem is the perennial one, dependency. You don't want to create
> a dependency on another file, much less a library/compiler, just because you're
> using a linked list to store your objects.
>


Yes, you program (and debug) the linked list package (that is always an
extra file excuse me) at each application that you program.

The idea of the ccl is that you use a debugged container library instead
of reinventng the wheel at each time.

In the case of a linked list it is possible to understand your view
point, but for other containers like a flexible (resizable) array or
a tree, a hashtable, etc the situation is different. You do not want
to program those, and then, instead of using a resizable array you
use a list even if it is slower!

I started with the linked list because is the simplest!

Thanks for your input Malcom.

jacob
 
Reply With Quote
 
Ian Collins
Guest
Posts: n/a
 
      12-08-2013
Malcolm McLean wrote:
>
> The other problem is the perennial one, dependency. You don't want to create
> a dependency on another file, much less a library/compiler, just because you're
> using a linked list to store your objects.


Where do you draw the line?

Somewhere between "You don't want to create a dependency just to output
a character" and "You don't want to create a dependency just to use a list"?

--
Ian Collins
 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      12-08-2013
Le 08/12/2013 19:57, Ian Collins a écrit :
> Malcolm McLean wrote:
>>
>> The other problem is the perennial one, dependency. You don't want to
>> create
>> a dependency on another file, much less a library/compiler, just
>> because you're
>> using a linked list to store your objects.

>
> Where do you draw the line?
>
> Somewhere between "You don't want to create a dependency just to output
> a character" and "You don't want to create a dependency just to use a
> list"?
>


In the case of a simple container like a single linked list it is easy
to just hack away a nth implementation of a list but for a hashtable
or a tree, or even an extensible array the effort is quite considerable
to get it right...


Then, you get stuck with using a list instead of a hashtable or
a flexible array!

I started with the linked list because is the simplest to explain.

jacob
 
Reply With Quote
 
Malcolm McLean
Guest
Posts: n/a
 
      12-08-2013
On Sunday, December 8, 2013 6:57:52 PM UTC, Ian Collins wrote:
> Malcolm McLean wrote:
>
>
> > The other problem is the perennial one, dependency. You don't want to
> > create a dependency on another file, much less a library/compiler, just
> > because you're using a linked list to store your objects.

>
>
> Where do you draw the line?
>
> Somewhere between "You don't want to create a dependency just to output
> a character" and "You don't want to create a dependency just to use a list"?
>

Programming has to be done by reasonably intelligent human beings. There's
no simple or algorithmic answer.

Is it important to be able to take a single file, and associated header, and
drop it into another program? Does it matter if you have three or four
generic linked lists kicking about in your program source (because Malc used
Jacob's library whilst Ian used a version based on C++ stl and Fred rolled
his own)?

There aren't easy answers. Sometimes it's easy, if we know that the code will
be used only in one program, it's not going to be of any interest to anyone
else developing similar programs, and we've got a small, controlled group
working on that one program, then once we've taken the decision to go the
Jacob container library route, there's no further issue, we should use it
for every linked list.

With outputting a character, that's not a "bit shuffling operation", so
inherently it has some dependency. So I try to move my IO to separate modules,
so the bulk of the program can be written as portable, independent, standalone
functions.

 
Reply With Quote
 
Johann Klammer
Guest
Posts: n/a
 
      12-08-2013
jacob navia wrote:
> Lists
> -----
> Single linked lists use a single pointer to the next element. The data
> for the element comes right behind that pointer to avoid the overhead
> that yet another pointer would represent.
>
> So, we use this structure:
>
> typedef struct _list_element {
> struct _list_element *Next;
> char Data[MINIMUM_ARRAY_INDEX]; // See below
> } list_element;
>
> The list header uses this structure to store the elements. As you can
> see, there is no space wasted in a pointer to the element stored. The
> element stored is placed just behind the Next pointer. The downside of
> this decision is that we can’t recycle this object to store other
> different objects of different size.
>
> The constant MINIMUM ARRAY INDEX is defined as 1 if we are compiling in
> C90 mode or as nothing if we are compiling in C99 mode. In C99 mode we
> have a flexible structure, that consists of a fixed and a variable part.
> The fixed part is the pointer to the next element. The variable part is
> the object we are storing in the list.
>


Did you look at the linked lists inside linux(include/linux/list.h...
types.h... struct list_head)? AFAIK they use a header structure, that
gets included in structures that want to use it. Then there's usually
some pointer arithmetics(usually things involving offsetof) to get from
the header pointer to the containing struct and vice-versa. It has the
advantage to be able have some struct in multiple lists (perhaps a rare
use-case.. but sometimes..), by using several header vars..
Any particular reason not to do it that way? I always thought, it was
rather clever...

 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      12-09-2013
Le 08/12/2013 22:02, Johann Klammer a écrit :
> Did you look at the linked lists inside linux(include/linux/list.h...
> types.h... struct list_head)? AFAIK they use a header structure, that
> gets included in structures that want to use it. Then there's usually
> some pointer arithmetics(usually things involving offsetof) to get from
> the header pointer to the containing struct and vice-versa. It has the
> advantage to be able have some struct in multiple lists (perhaps a rare
> use-case.. but sometimes..), by using several header vars..
> Any particular reason not to do it that way? I always thought, it was
> rather clever...


Well, the idea of having a header in each element is too heavy
for the containers since it is of the order of:

sizeof(elementHeader) * N

In my case this is just

sizeof(void *) * N // the "next item" pointer

The library doesn't know anything about any offsets into your data.
It treats the data as a container (a box for instance) into which
you store stuff and retrieve it. What is stored is invisible for the
container. You do not expect a box to handle the objects you store into
it any differently from each other!

You can have generic lists specialized for any conceivable type
if you use the generic list container, to be explained later.

Yes, in C. It uses the C preprocessor. Of course I have always read
that it can't be done. I believed that also, until... I wrote it.

It is actually very simple.

http://code.google.com/p/ccl/

All the code can be downloaded from there and hacked at will.

Enjoy!

jacob
 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      12-09-2013
jacob navia <(E-Mail Removed)> writes:

> Le 08/12/2013 22:02, Johann Klammer a écrit :
>> Did you look at the linked lists inside linux(include/linux/list.h...
>> types.h... struct list_head)? AFAIK they use a header structure, that
>> gets included in structures that want to use it. Then there's usually
>> some pointer arithmetics(usually things involving offsetof) to get from
>> the header pointer to the containing struct and vice-versa. It has the
>> advantage to be able have some struct in multiple lists (perhaps a rare
>> use-case.. but sometimes..), by using several header vars..
>> Any particular reason not to do it that way? I always thought, it was
>> rather clever...

>
> Well, the idea of having a header in each element is too heavy
> for the containers since it is of the order of:
>
> sizeof(elementHeader) * N
>
> In my case this is just
>
> sizeof(void *) * N // the "next item" pointer


The Linux lists are doubly linked, so two pointers are needed. The
overhead, though, is just that: two pointers plus any required padding.
Using the same technique for singly linked lists would presumably have
the same overhead as yours: one pointer plus any required padding.

<snip>
--
Ben.
 
Reply With Quote
 
BartC
Guest
Posts: n/a
 
      12-09-2013
"jacob navia" <(E-Mail Removed)> wrote in message
news:l832pm$63l$(E-Mail Removed)...

> It is actually very simple.
>
> http://code.google.com/p/ccl/


I've looked at the PDF, and one word I wouldn't use to describe it is
simple! (Comprehensive, perhaps.)

Its 370 dense pages are 100 more than K&R 2 which describes the entire
language.

--
Bartc
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
A container library in C Part 4. The arraylist container jacob navia C Programming 10 10-01-2009 02:26 PM
A container library in C: Part 3. List container implementation jacob navia C Programming 5 09-28-2009 08:19 PM
Call for Papers: World Congress on Engineering WCE 2007 (IAENG conferences with Engineering Letters) imecs__2007@iaeng.org Java 0 12-19-2006 06:51 AM
CFP: Special Issue on Web Engineering (The journal Engineering Letters) imecs_2006@iaeng.org Java 0 01-07-2006 04:48 AM
std::container::iterator vs std::container::pointer Vivi Orunitia C++ 11 02-04-2004 08:09 AM



Advertisments