Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > std::min/max vs own functions

Reply
Thread Tools

std::min/max vs own functions

 
 
eLVa
Guest
Posts: n/a
 
      04-11-2011
Hi everyone,

Below is a test code I made after noticing a big difference in terms
of running time between the use of std::min or std::max and a
templated version of it. I'm not sure where it does come from, so any
comments are welcome.
The code does not anything usefull, but it is a simplification of the
real code (the purpose is only to show the difference of timings ..)

I don't know if there is a proper way of posting a chunk of code, so I
just drop it here ...

#include <algorithm>
#include <iostream>
#include <sys/time.h>

typedef unsigned char uchar;
typedef unsigned int uint;
using namespace std;

void fn(const uchar *pL, const uchar *pR, int w, uint *pD) {
int vlm, vlp;
for (int i=0; i<w; ++i) {
vlm = (i>0 ? (pL[i-1]+pL[i])/2 : pL[i]);
vlp = (i<w-1 ? (pL[i]+pL[i+1])/2 : pL[i]);

pD[i] = std::min<int>(vlp,std::max<int>(vlm,pR[i]));
}
}

template <class T> T min2(const T&a, const T&b) { return a<b?a:b; }
template <class T> T max2(const T&a, const T&b) { return a<b?a:b; }

void fn2(const uchar *pL, const uchar *pR, int w, uint *pD) {
int vlm, vlp;
for (int i=0; i<w; ++i) {
vlm = (i>0 ? (pL[i-1]+pL[i])/2 : pL[i]);
vlp = (i<w-1 ? (pL[i]+pL[i+1])/2 : pL[i]);

pD[i] = min2<int>(vlp,max2<int>(vlm,pR[i]));
}
}

int main(int argc, char *argv[]) {

int h = 400, w = 500;
uchar *T1[h], *T2[h];
uint *T3[h];

for (int i=0; i<h; ++i) {
T1[i] = new uchar[w];
T2[i] = new uchar[w];
T3[i] = new uint[w];
}

for (int j=0; j<h; ++j) {
for (int i=0; i<w; ++i) {
T1[j][i] = rand()%256;
T2[j][i] = rand()%256;
}
}

struct timeval t1, t2;
gettimeofday(&t1, NULL);
for (int z=0; z<1000; ++z) {
for (int j=0; j<h; ++j) {
uchar *p1 = T1[j];
uchar *p2 = T2[j];
uint *p3 = T3[j];
fn(p1, p2, w, p3);
}
}
gettimeofday(&t2, NULL);
cout << "Took " << (t2.tv_sec-t1.tv_sec)+(t2.tv_usec-t1.tv_usec)/
1.e6 << endl;

gettimeofday(&t1, NULL);
for (int z=0; z<1000; ++z) {
for (int j=0; j<h; ++j) {
uchar *p1 = T1[j];
uchar *p2 = T2[j];
uint *p3 = T3[j];
fn2(p1, p2, w, p3);
}
}
gettimeofday(&t2, NULL);
cout << "Took " << (t2.tv_sec-t1.tv_sec)+(t2.tv_usec-t1.tv_usec)/
1.e6 << endl;
}

Then I compiled it with :
g++ test.cpp -O3 -o test

And this are the results :
Took 1.58356
Took 0.789235

So the second version which use templated functions is approx 2times
faster. Is it normal ?

I tested it on MacosX (10.5) (the timings shown above), and on Linux
where timings are (5.45 for the first version and 4.71 for the second
one : the machine is older, thus the big difference, but still the
second is faster).

Can anyone reproduce this and tell me if this is normal.

Thanks
 
Reply With Quote
 
 
 
 
eLVa
Guest
Posts: n/a
 
      04-11-2011
On Apr 11, 1:06*pm, Pete Becker <(E-Mail Removed)> wrote:
> On 2011-04-11 12:10:55 -0400, eLVa said:
>
>
>
> > template <class T> T min2(const T&a, const T&b) { return a<b?a:b; }
> > template <class T> T max2(const T&a, const T&b) { return a<b?a:b; }

>
> Different isn't the same. Since max2 doesn't do the same thing as
> std::max, despite the misleading name, the measurements don't mean much.
>
> --
> * Pete
> Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
> Standard C++ Library Extensions: a Tutorial and Reference
> (www.petebecker.com/tr1book)


Sorry for the mistake, it's a copy paste error, of course you should
read

template <class T> T max2(const T&a, const T&b) { return a>b?a:b; }

But the timings stay the same ...
So where could this come from ?
 
Reply With Quote
 
 
 
 
Alf P. Steinbach /Usenet
Guest
Posts: n/a
 
      04-11-2011
* eLVa, on 11.04.2011 19:58:
> On Apr 11, 1:06 pm, Pete Becker<(E-Mail Removed)> wrote:
>> On 2011-04-11 12:10:55 -0400, eLVa said:
>>
>>
>>
>>> template<class T> T min2(const T&a, const T&b) { return a<b?a:b; }
>>> template<class T> T max2(const T&a, const T&b) { return a<b?a:b; }

>>
>> Different isn't the same. Since max2 doesn't do the same thing as
>> std::max, despite the misleading name, the measurements don't mean much.
>>
>> --
>> Pete
>> Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
>> Standard C++ Library Extensions: a Tutorial and Reference
>> (www.petebecker.com/tr1book)

>
> Sorry for the mistake, it's a copy paste error, of course you should
> read
>
> template<class T> T max2(const T&a, const T&b) { return a>b?a:b; }
>
> But the timings stay the same ...
> So where could this come from ?


I think your testing code is pretty obscure.

Can you reproduce in small simple test program, regardless of testing std first
or second?

If so, might be that std::max is instrumented in some way. Check your compiler's
documentation about switches and preprocessor symbols. Please post your results
-- however it turns out!


Cheers,

- Alf

--
blog at <url: http://alfps.wordpress.com>
 
Reply With Quote
 
Puppet_Sock
Guest
Posts: n/a
 
      04-11-2011
On Apr 11, 12:10*pm, eLVa <(E-Mail Removed)> wrote:
> Hi everyone,
>
> Below is a test code I made after noticing a big difference in terms
> of running time between the use of std::min or std::max and a
> templated version of it. I'm not sure where it does come from, so any
> comments are welcome.
> The code does not anything usefull, but it is a simplification of the
> real code (the purpose is only to show the difference of timings ..)
>
> I don't know if there is a proper way of posting a chunk of code, so I
> just drop it here ...
>
> #include <algorithm>
> #include <iostream>
> #include <sys/time.h>
>
> typedef unsigned char uchar;
> typedef unsigned int uint;
> using namespace std;
>
> void fn(const uchar *pL, const uchar *pR, int w, uint *pD) {
> * * int vlm, vlp;
> * * for (int i=0; i<w; ++i) {
> * * * * vlm = (i>0 ? (pL[i-1]+pL[i])/2 : pL[i]);
> * * * * vlp = (i<w-1 ? (pL[i]+pL[i+1])/2 : pL[i]);
>
> * * * * pD[i] = std::min<int>(vlp,std::max<int>(vlm,pR[i]));
> * * }
>
> }


Hmmm... vlm and vlp are int, getting assigned the
results of uchar arithmetic.

Hmmm... pR[i] is a uchar, but the max that is getting
called is max<int>. Hmmm...

Hmmm... pD[i] is a unit getting assigned the result
of std::min<int> Hmmm...

Also, you are doing a compare of i to 0 and a compare
of i to w-1 for each call of std::max and std::min.
And you are doing two averages for most cases.
So, I'm expecting that calls to min and max do not
dominate this function.

I wonder if you can't make a function that is
dominated by the max/min calls. I don't know just
what is happening to produce the reported values
you gave, but I'm thinking they don't show what
you think they show.
Socks
 
Reply With Quote
 
Jeff Flinn
Guest
Posts: n/a
 
      04-11-2011
eLVa wrote:
> On Apr 11, 1:06 pm, Pete Becker <(E-Mail Removed)> wrote:
>> On 2011-04-11 12:10:55 -0400, eLVa said:
>>
>>
>>
>>> template <class T> T min2(const T&a, const T&b) { return a<b?a:b; }
>>> template <class T> T max2(const T&a, const T&b) { return a<b?a:b; }

>> Different isn't the same. Since max2 doesn't do the same thing as
>> std::max, despite the misleading name, the measurements don't mean much.
>>
>> --
>> Pete
>> Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
>> Standard C++ Library Extensions: a Tutorial and Reference
>> (www.petebecker.com/tr1book)

>
> Sorry for the mistake, it's a copy paste error, of course you should
> read
>
> template <class T> T max2(const T&a, const T&b) { return a>b?a:b; }
>
> But the timings stay the same ...
> So where could this come from ?


What happens if you call fn2 first, then fn?

Jeff
 
Reply With Quote
 
eLVa
Guest
Posts: n/a
 
      04-11-2011
On Apr 11, 3:06*pm, Puppet_Sock <(E-Mail Removed)> wrote:
> On Apr 11, 12:10*pm, eLVa <(E-Mail Removed)> wrote:
>
>
>
> > Hi everyone,

>
> > Below is a test code I made after noticing a big difference in terms
> > of running time between the use of std::min or std::max and a
> > templated version of it. I'm not sure where it does come from, so any
> > comments are welcome.
> > The code does not anything usefull, but it is a simplification of the
> > real code (the purpose is only to show the difference of timings ..)

>
> > I don't know if there is a proper way of posting a chunk of code, so I
> > just drop it here ...

>
> > #include <algorithm>
> > #include <iostream>
> > #include <sys/time.h>

>
> > typedef unsigned char uchar;
> > typedef unsigned int uint;
> > using namespace std;

>
> > void fn(const uchar *pL, const uchar *pR, int w, uint *pD) {
> > * * int vlm, vlp;
> > * * for (int i=0; i<w; ++i) {
> > * * * * vlm = (i>0 ? (pL[i-1]+pL[i])/2 : pL[i]);
> > * * * * vlp = (i<w-1 ? (pL[i]+pL[i+1])/2 : pL[i]);

>
> > * * * * pD[i] = std::min<int>(vlp,std::max<int>(vlm,pR[i]));
> > * * }

>
> > }

>
> Hmmm... vlm and vlp are int, getting assigned the
> results of uchar arithmetic.


Is that a problem ?

>
> Hmmm... pR[i] is a uchar, but the max that is getting
> called is max<int>. Hmmm...


Again, is it a problem ?
It's a subsample of my code, where later I use int calculations ...

>
> Hmmm... pD[i] is a unit getting assigned the result
> of std::min<int> Hmmm...
>
> Also, you are doing a compare of i to 0 and a compare
> of i to w-1 for each call of std::max and std::min.
> And you are doing two averages for most cases.
> So, I'm expecting that calls to min and max do not
> dominate this function.


The other function (fn2) is doing the same so why would it be
dominated by that time in one case, not in the other ?

>
> I wonder if you can't make a function that is
> dominated by the max/min calls. I don't know just
> what is happening to produce the reported values
> you gave, but I'm thinking they don't show what
> you think they show.
> Socks


I tried to but depending on what I do, the results are not showing
those differences ...
I'm not sure what to test, and how to diagnose the real problem!


 
Reply With Quote
 
eLVa
Guest
Posts: n/a
 
      04-11-2011
On Apr 11, 3:09*pm, Jeff Flinn <(E-Mail Removed)> wrote:
> eLVa wrote:
> > On Apr 11, 1:06 pm, Pete Becker <(E-Mail Removed)> wrote:
> >> On 2011-04-11 12:10:55 -0400, eLVa said:

>
> >>> template <class T> T min2(const T&a, const T&b) { return a<b?a:b; }
> >>> template <class T> T max2(const T&a, const T&b) { return a<b?a:b; }
> >> Different isn't the same. Since max2 doesn't do the same thing as
> >> std::max, despite the misleading name, the measurements don't mean much.

>
> >> --
> >> * Pete
> >> Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
> >> Standard C++ Library Extensions: a Tutorial and Reference
> >> (www.petebecker.com/tr1book)

>
> > Sorry for the mistake, it's a copy paste error, of course you should
> > read

>
> > template <class T> T max2(const T&a, const T&b) { return a>b?a:b; }

>
> > But the timings stay the same ...
> > So where could this come from ?

>
> What happens if you call fn2 first, then fn?


I get the same results (fn2 is two times faster than fn).

>
> Jeff


 
Reply With Quote
 
eLVa
Guest
Posts: n/a
 
      04-11-2011
On Apr 11, 3:18*pm, Paavo Helde <(E-Mail Removed)> wrote:
> eLVa <(E-Mail Removed)> wrote in news:fd844d48-c7d9-4356-9ffc-
> (E-Mail Removed):
>
>
>
>
>
> > Hi everyone,

>
> > Below is a test code I made after noticing a big difference in terms
> > of running time between the use of std::min or std::max and a
> > templated version of it. I'm not sure where it does come from, so any
> > comments are welcome.

> ...
> > * * int h = 400, w = 500;
> > * * uchar *T1[h], *T2[h];
> > * * uint *T3[h];

>
> > * * for (int i=0; i<h; ++i) {
> > * * * * T1[i] = new uchar[w];
> > * * * * T2[i] = new uchar[w];
> > * * * * T3[i] = new uint[w];
> > * * }

>
> If you are so keen on performance then note that allocating each row
> separately may take quite a lot of time, and accessing the data later may*
> also be slower because of worse locality and double indirection (the
> latter is not a problem in this concrete example as you take out the row
> pointers beforehand). In general it might be faster to allocate the whole
> array as new uchar[w*h] and access as T1[j*w+i]. YMMV, of course.


This is a sample of my code where I use a function that's applied on
two "lines".
In my real code, the array are allocated continuously, but for the
test, I found it simpler
to juste do it quickly and allocating each row separately.

The purpose of this sample was to show the problem, and I don't think
allocating data continuously might change it.


>
> > Then I compiled it with :
> > *g++ test.cpp -O3 -o test

>
> > And this are the results :
> > Took 1.58356
> > Took 0.789235

>
> > So the second version which use templated functions is approx 2times
> > faster. Is it normal ?

>
> std::min/max are also template functions.
>
> FWIW, there seems to be some effect similar to your findings also in
> Windows (VS2010 64-bit Release build, with timeofday replaced by clock
> ()). Seems to be due to different inlining of functions, but I'm not
> sure. The difference is not so large, ca 0.93s vs 0.71s. In any case it's
> quite strange, std::min/max should not have any intrinsic overhead.


Yes I know that std::min/max are template functions, however I don't
understand where the overhead is coming from.
Do any of you experience the same, or is it some kind of compiler
trick ? (I have no knowledge on this, and that's why I'm eager to know
about your testings).

>
> Cheers
> Paavo


 
Reply With Quote
 
Martijn van Buul
Guest
Posts: n/a
 
      04-11-2011
* eLVa:
> gettimeofday(&t1, NULL);
> for (int z=0; z<1000; ++z) {
> for (int j=0; j<h; ++j) {
> uchar *p1 = T1[j];
> uchar *p2 = T2[j];
> uint *p3 = T3[j];
> fn(p1, p2, w, p3);
> }
> }
> gettimeofday(&t2, NULL);
> cout << "Took " << (t2.tv_sec-t1.tv_sec)+(t2.tv_usec-t1.tv_usec)/
> 1.e6 << endl;


While I doubt it'll explain the differences you're seeing, one word of
advice:

You're benchmarking against 'real time' here, which may be skewing your
results, especially on a loaded system. It might even be getting distorted
by efforts to keep the system clock synchronised.

It's usually safer to benchmark against process runtime - On Unix-like
systems, see getrusage or clock.

--
Martijn van Buul - http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
eLVa
Guest
Posts: n/a
 
      04-11-2011
On Apr 11, 2:48*pm, "Alf P. Steinbach /Usenet" <alf.p.steinbach
(E-Mail Removed)> wrote:
> * eLVa, on 11.04.2011 19:58:
>
>
>
> > On Apr 11, 1:06 pm, Pete Becker<(E-Mail Removed)> *wrote:
> >> On 2011-04-11 12:10:55 -0400, eLVa said:

>
> >>> template<class T> *T min2(const T&a, const T&b) { return a<b?a:b; }
> >>> template<class T> *T max2(const T&a, const T&b) { return a<b?a:b; }

>
> >> Different isn't the same. Since max2 doesn't do the same thing as
> >> std::max, despite the misleading name, the measurements don't mean much.

>
> >> --
> >> * *Pete
> >> Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
> >> Standard C++ Library Extensions: a Tutorial and Reference
> >> (www.petebecker.com/tr1book)

>
> > Sorry for the mistake, it's a copy paste error, of course you should
> > read

>
> > template<class T> *T max2(const T&a, const T&b) { return a>b?a:b; }

>
> > But the timings stay the same ...
> > So where could this come from ?

>
> I think your testing code is pretty obscure.
>
> Can you reproduce in small simple test program, regardless of testing stdfirst
> or second?
>
> If so, might be that std::max is instrumented in some way. Check your compiler's
> documentation about switches and preprocessor symbols. Please post your results
> * -- *however it turns out!


After some testing, I have found that the particular compiler I was
using (/usr/bin/g++ v4.0.1 on MacOsX) does not give the same timings
as a recent one (g++ v4.4). In this one, both functions works with the
same timings.
As for the Linux version, I have not tested, but I suppose this is the
same issue.

I consider my problem fixed (as it seems it was just a problem of
updating my version of g++)
Does any of you have the same difference in timings ?

>
> Cheers,
>
> - Alf
>
> --
> blog at <url:http://alfps.wordpress.com>


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Using own classloader inside J2EE to load and unload own classes. Stefan Siegl Java 1 07-02-2013 05:05 AM
Allowing access to my own computers within my own network =?Utf-8?B?VHJldm9y?= Wireless Networking 2 07-20-2006 09:05 PM
I have built my own (simple) thread manager [TM], but just found java 5 has its own. Saverio M. Java 0 07-03-2006 08:52 AM
Your own photos in your own book Frank ess Digital Photography 1 12-09-2004 05:54 PM
please help me in distinguish redefining functions, overloading functions and overriding functions. Xiangliang Meng C++ 1 06-21-2004 03:11 AM



Advertisments