Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Re: Breakthrough

Reply
Thread Tools

Re: Breakthrough

 
 
MelissA
Guest
Posts: n/a
 
      06-04-2012
On Mon, 04 Jun 2012 13:38:47 +0200
jacob navia <(E-Mail Removed)> wrote:

> Hi
>
> In a recent discussion ("Open letter to Ian Collins") we found out
> that the main problem in the C containers library was the time taken
> by the Sort function.
>
> The reasons for that is that it was a general sort function calling a
> general comparison function that uses memcmp...
>
> The first step to improve performance was to write data type specific
> functions using a data type generic file. As a bonus, we get compile
> time checking and better syntax.
>
> Still, the generic sort function was taking a lot of time.
>
> I have written then a data type generic quick sort function that
> receives as a parameter a macro that is used to compare two elements.
> This vastly improves performance.
>
> The times are:
>
> Generic sort function without any specialized heap
> (using malloc)
> real 0m17.029s
> user 0m16.654s
> sys 0m0.368s
>
> Generic sort function using a specialized heap
> (reduces the malloc overhead)
> real 0m11.759s
> user 0m11.500s
> sys 0m0.252s
>
> "Templated" sort function using a macro parameter
> (reduces comparisons to a simple expression)
> real 0m6.453s
> user 0m6.165s
> sys 0m0.281s
>
> The expression used to compare two list elements is:
> #define COMPARE_EXPRESSION(A, B) \
> ((*B)->Data > (*A)->Data ? -1 : (*B)->Data != (*A)->Data)
>
> I thank Pete for that proposal, I wouldn't have come to it
> alone
>
> I would like to have the corresponding C++ times but I fear that
> if I write it it would be unfair since I am not a C++ wizard...
> Here is the C code for the example:
>
> #include <stdlib.h>
> #include "containers.h"
> #include "intlist.h"
> #define N 10000000
> int main(void)
> {
> intList *L1;
> size_t i;
> int d;
> long long sum=0,newSum=0;
> Iterator *it;
> int *pint;
>
> L1 = iintList.Create(sizeof(int));
> iintList.UseHeap(L1,NULL); // Use specialized Heap
> // Add N random numbers to the integer list
> for (i=0; i<N;i++) {
> d = rand();
> sum += d;
> iintList.Add(L1,d);
> }
> // Sort it
> iintList.Sort(L1);
> // Go through the sorted list using an iterator
> it = iintList.NewIterator(L1);
> i=0;
> for (pint = it->GetFirst(it); pint; pint = it->GetNext(it)) {
> newSum += *pint;
> }
> // Verify that both sums are identical
> if (sum != newSum)
> printf("Failed\n");
> else printf("Sum = %lld\n",sum);
> // Destroy the list
> iintList.Finalize(L1);
> }
>
> I have uploaded the latest code to:
>
> http://code.google.com/p/ccl/
>
> There you will find (in listgen.c) the "templated" version of quick
> sort in C.
>


Here are timings on mu computer.

bmaxa@maxa:~/jacob/ccl$ time ./cppvec
Sum = 10738138201479754

real 0m0.899s
user 0m0.832s
sys 0m0.064s
bmaxa@maxa:~/jacob/ccl$ time ./cpplist
Sum = 10738138201479754

real 0m7.493s
user 0m7.344s
sys 0m0.136s
bmaxa@maxa:~/jacob/ccl$ time ./testlist
Sum = 10738138201479754

real 0m4.320s
user 0m4.180s
sys 0m0.140s

Sources:
---- cppvec.cpp

#include <vector>
#include <cstdio>
#include <cstdlib>
#include <algorithm>

#define N 10000000

int main()
{
std::vector<int> L1;
long long sum=0,newSum=0;
for(int i = 0; i < N; ++i)
{
int d = rand();
sum+=d;
L1.push_back(d);
}

std::sort(L1.begin(),L1.end());

for(auto it : L1)
{
newSum+= it;
}

if (sum != newSum)
printf("Failed\n");
else printf("Sum = %lld\n",sum);

}

----cpplist.cpp

#include <list>
#include <cstdio>
#include <cstdlib>

#define N 10000000

int main()
{
std::list<int> L1;
long long sum=0,newSum=0;
for(int i = 0; i < N; ++i)
{
int d = rand();
sum+=d;
L1.push_back(d);
}

L1.sort();

for(auto it : L1)
{
newSum+= it;
}

if (sum != newSum)
printf("Failed\n");
else printf("Sum = %lld\n",sum);

}

Compile with -std=c++0x for gcc 4.6 or -std=c++11 for 4.7

As you can see your program is faster than list version
of C++, but is less elegant


Greets!

 
Reply With Quote
 
 
 
 
Zoltan Juhasz
Guest
Posts: n/a
 
      06-04-2012
On Monday, 4 June 2012 08:20:01 UTC-4, MelissA wrote:
> Here are timings on mu computer.
>
> bmaxa@maxa:~/jacob/ccl$ time ./cppvec
> Sum = 10738138201479754
>
> real 0m0.899s
> user 0m0.832s
> sys 0m0.064s
> bmaxa@maxa:~/jacob/ccl$ time ./cpplist
> Sum = 10738138201479754
>
> real 0m7.493s
> user 0m7.344s
> sys 0m0.136s
> bmaxa@maxa:~/jacob/ccl$ time ./testlist
> Sum = 10738138201479754
>
> real 0m4.320s
> user 0m4.180s
> sys 0m0.140s
>


I've got two problems with the analysis:

1. You are measuring a lot more than the sort. Narrow the scope of measurement down to the portion you want to measure.
2. Assuming that the C version uses doubly-linked list (just like the C++ version), the C version uses a small-object allocator to reduce the malloc'sheader overhead. To do a fair comparison, you'll need to do the same for the C++ version; a small-object allocator might enhance locality dramatically, plus eliminates the 8/16 byte overhead per allocation introduced by malloc (at least some glibc malloc does that).

I would expect that you'll end up seeing the same result.
 
Reply With Quote
 
 
 
 
MelissA
Guest
Posts: n/a
 
      06-04-2012
On Mon, 4 Jun 2012 07:08:45 -0700 (PDT)
Zoltan Juhasz <(E-Mail Removed)> wrote:

> On Monday, 4 June 2012 08:20:01 UTC-4, MelissA wrote:
> > Here are timings on mu computer.
> >
> > bmaxa@maxa:~/jacob/ccl$ time ./cppvec
> > Sum = 10738138201479754
> >
> > real 0m0.899s
> > user 0m0.832s
> > sys 0m0.064s
> > bmaxa@maxa:~/jacob/ccl$ time ./cpplist
> > Sum = 10738138201479754
> >
> > real 0m7.493s
> > user 0m7.344s
> > sys 0m0.136s
> > bmaxa@maxa:~/jacob/ccl$ time ./testlist
> > Sum = 10738138201479754
> >
> > real 0m4.320s
> > user 0m4.180s
> > sys 0m0.140s
> >

>
> I've got two problems with the analysis:
>
> 1. You are measuring a lot more than the sort. Narrow the scope of
> measurement down to the portion you want to measure. 2. Assuming that
> the C version uses doubly-linked list (just like the C++ version),
> the C version uses a small-object allocator to reduce the malloc's
> header overhead. To do a fair comparison, you'll need to do the same
> for the C++ version; a small-object allocator might enhance locality
> dramatically, plus eliminates the 8/16 byte overhead per allocation
> introduced by malloc (at least some glibc malloc does that).
>
> I would expect that you'll end up seeing the same result.


timings without sort:

bmaxa@maxa:~/jacob/ccl$ time ./cpplist
Sum = 10738138201479754

real 0m0.626s
user 0m0.504s
sys 0m0.120s
bmaxa@maxa:~/jacob/ccl$ time ./testlist
Sum = 10738138201479754

real 0m0.440s
user 0m0.316s
sys 0m0.120s

So it's pretty small difference in allocation. Biggest time is spent
in sort function. After sort, traverse can be also slower, if sort
changes order of nodes.


 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      06-04-2012
Le 04/06/12 16:08, Zoltan Juhasz a écrit :

(1)
The problem is that in older versions of C++ there isn't a single
linked list...

(2)
How would you use a small object allocator using the STL?
The C containers library comes with a small object allocator
already done for you. I thought that a classic problem like
that would be solved already.
 
Reply With Quote
 
Zoltan Juhasz
Guest
Posts: n/a
 
      06-04-2012
On Monday, 4 June 2012 10:20:38 UTC-4, jacob navia wrote:
> Le 04/06/12 16:08, Zoltan Juhasz a écrit :
>
> (1)
> The problem is that in older versions of C++ there isn't a single
> linked list...


Right, only C++11. That is why I wanted to make sure both
uses the same kind, otherwise the measurement is rather unfair.

> (2)
> How would you use a small object allocator using the STL?
> The C containers library comes with a small object allocator
> already done for you. I thought that a classic problem like
> that would be solved already.


I am not sure if the question is directed towards
- possibility or
- feasibility

As far as possibility is concerned:

STL containers provide an 'Allocator' template parameter,
where you can provide your own allocator. Your custom
allocator then can be built on top of malloc (such as
Loki::SmallObj allocator, by using pools), or you can even
use an alternative malloc implementation to malloc / free
memory.

As far a feasibility is concerned, I am moderately sure that
there are off-the-shelf, STL compatible small-object
allocators available; providing an additional template argument
should not be a problem.

Alternatively, there are other malloc implementations, such as
jmalloc, which segregate memory based on allocation size,
thus eliminating some of the overhead.

Either way, it is possible, and not horribly complex.

-- Zoltan
 
Reply With Quote
 
Juha Nieminen
Guest
Posts: n/a
 
      06-04-2012
In comp.lang.c++ MelissA <(E-Mail Removed)> wrote:
> #define N 10000000
>
> int main()
> {
> std::list<int> L1;
> long long sum=0,newSum=0;
> for(int i = 0; i < N; ++i)
> {
> int d = rand();
> sum+=d;
> L1.push_back(d);


Your test is basically useless as a measurement of sorting speed because
the vast majority of the time will be spent on those push_back() calls.

If you want to test sorting speed and nothing else, you have to discard
the time spent in memory allocation.
 
Reply With Quote
 
Ike Naar
Guest
Posts: n/a
 
      06-04-2012
On 2012-06-04, Juha Nieminen <(E-Mail Removed)> wrote:
> In comp.lang.c++ MelissA <(E-Mail Removed)> wrote:
>> #define N 10000000
>>
>> int main()
>> {
>> std::list<int> L1;
>> long long sum=0,newSum=0;
>> for(int i = 0; i < N; ++i)
>> {
>> int d = rand();
>> sum+=d;
>> L1.push_back(d);

>
> Your test is basically useless as a measurement of sorting speed because
> the vast majority of the time will be spent on those push_back() calls.
>
> If you want to test sorting speed and nothing else, you have to discard
> the time spent in memory allocation.


On the machine that I tried, most of the time was spent on the
implicit destruction of the list at program termination.
 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      06-04-2012
Le 04/06/12 17:32, Juha Nieminen a écrit :
> In comp.lang.c++ MelissA<(E-Mail Removed)> wrote:
>> #define N 10000000
>>
>> int main()
>> {
>> std::list<int> L1;
>> long long sum=0,newSum=0;
>> for(int i = 0; i< N; ++i)
>> {
>> int d = rand();
>> sum+=d;
>> L1.push_back(d);

>
> Your test is basically useless as a measurement of sorting speed because
> the vast majority of the time will be spent on those push_back() calls.
>


The purpose of this test is to compare the STL to the C Containers
Library (CCL). With this in mind I wanted a big test with both systems
doing roughly the same thing:

1) Allocate a list
2) Fill it with 100 million random numbers adding each number
to a grand total
3) Sort it
4) Go through the whole list again, reading the data and verify that
you get the same number.
5) Destroy the list.

> If you want to test sorting speed and nothing else, you have to discard
> the time spent in memory allocation.


Yes, but what I want to test is the speed to do ALL tasks mentioned above

 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      06-04-2012
Le 04/06/12 16:54, Zoltan Juhasz a écrit :
> On Monday, 4 June 2012 10:20:38 UTC-4, jacob navia wrote:
>> Le 04/06/12 16:08, Zoltan Juhasz a écrit :
>>
>> (1)
>> The problem is that in older versions of C++ there isn't a single
>> linked list...

>
> Right, only C++11. That is why I wanted to make sure both
> uses the same kind, otherwise the measurement is rather unfair.
>


You are in principle right. I will rewrite the double linked
list module and redo the test.


>> (2)
>> How would you use a small object allocator using the STL?
>> The C containers library comes with a small object allocator
>> already done for you. I thought that a classic problem like
>> that would be solved already.

>
> I am not sure if the question is directed towards
> - possibility or
> - feasibility
>
> As far as possibility is concerned:
>
> STL containers provide an 'Allocator' template parameter,
> where you can provide your own allocator.


The same for the CCL

> Your custom
> allocator then can be built on top of malloc (such as
> Loki::SmallObj allocator, by using pools), or you can even
> use an alternative malloc implementation to malloc / free
> memory.
>


Yes, but in the CCL you get one already built for your. You just
tell the system

iList.UseHeap(MyList);

and the provided small object allocator is plugged in to the
list.

C++ is a bit more complicated in this respect, but I would
like to use a common library. Loki needs to be downloaded,
compiled, and it is lacking some examples...

In this page
http://sourceforge.net/projects/loki.../topic/3828398

people say it runs slower than malloc...


> As far a feasibility is concerned, I am moderately sure that
> there are off-the-shelf, STL compatible small-object
> allocators available; providing an additional template argument
> should not be a problem.
>


Well, besides Loki what others would you propose?


> Alternatively, there are other malloc implementations, such as
> jmalloc, which segregate memory based on allocation size,
> thus eliminating some of the overhead.
>
> Either way, it is possible, and not horribly complex.
>
> -- Zoltan


Not horribly complex for you but for me it looks like a big deal...
 
Reply With Quote
 
jacob navia
Guest
Posts: n/a
 
      06-04-2012
Le 04/06/12 17:44, Ike Naar a écrit :
>
> On the machine that I tried, most of the time was spent on the
> implicit destruction of the list at program termination.


Yes, I see a noticeable delay between the printing of the
result sum and the end of the program.

Apparently cleaning up is quit e expensive since destructors must be
called 100 million times.

In the CCL you can ADD a destructor if you want but they are NOT
required as in C++.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Amazing breakthrough in digital photography Lionel Lauer Digital Photography 1 06-29-2004 06:38 PM
Scientific breakthrough: magnetic lenses  @ .  Digital Photography 0 04-05-2004 01:59 AM
A breakthrough in lens technology: must read REED BOXIN Digital Photography 3 03-07-2004 08:40 PM
A remarkable breakthrough: Sony DSC-F828 Jill Digital Photography 24 10-28-2003 05:52 AM
Re: Whitley Strieber's New Breakthrough Baron Maximillian von Schtuldeworfshiseundurheimhoppen Computer Support 0 10-19-2003 01:09 AM



Advertisments