Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > bitset<32> and bitset<64> efficiency

Reply
Thread Tools

bitset<32> and bitset<64> efficiency

 
 
Johannes Bauer
Guest
Posts: n/a
 
      11-29-2012
On 29.11.2012 19:00, Luca Risolia wrote:
> On 29/11/2012 14:24, Johannes Bauer wrote:
>
>> Is is definitely not inlined even though I specified -O3.

>
> Try with g++ 4.7 and -Ofast..


-Ofast doesn't change anything with g++ on 4.6.2, underlining the fact
that your "Just make sure to compile with optimizations on" was
ill-advised. Apparently older g++ versions don't inline this construct
for whatever reason.

Regards,
johannes


--
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?

> Zumindest nicht öffentlich!

Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$>
 
Reply With Quote
 
 
 
 
Ninds
Guest
Posts: n/a
 
      11-29-2012
On Thursday, 29 November 2012 17:47:58 UTC, Johannes Bauer wrote:
> On 29.11.2012 15:58, Ninds wrote:
>
>
>
> [Quoting chaos]
>
>
>
> As others have mentioned, PLEASE take care of the quoting mess that your
>
> newsreader produces. It makes your messages hard to decipher.
>
>
>
> > Is it possible that:

>
> > 1. For your case, on x86_64 there would be no need for inlining since bitset<64> degenerates exactly to native int ?

>
>
>
> Well, a bitset<64> on my example is 8 bytes wide while a native int is 4
>
> bytes. But if it did exactly degenerate into a native int, it would
>
> still make lots of sense of the compiler to inline the code in order to
>
> simplify it and get rid of the call overhead.
>
>
>
> > 2. The same case on a 32bit machine would require a call since the set op is no longer atomic

>
>
>
> This doesn't make sense. How does atomicity come into play here? There's
>
> no guarantee a bitset does atomically alter the bits.
>
>
>
> > 3. For bitset<128> you case makes a call ?

>
>
>
> Yes.
>
>
>
> Regards,
>
> Johannes
>
>
>
>
>
> --
>
> >> Wo hattest Du das Beben nochmal GENAU vorhergesagt?

>
> > Zumindest nicht �ffentlich!

>
> Ah, der neueste und bis heute genialste Streich unsere gro�en
>
> Kosmologen: Die Geheim-Vorhersage.
>
> - Karl Kaos �ber R�diger Thomas in dsa <hidbv3$om2$>


Perhaps I was a bit unclear.
What I meant was that if the native int is 32 bits then it is plausible that the compiler generates different code for the set method for bitset<32> than for say bitset<33>. I just mean that for bitset<32> the setting of all the bits is atomic (I accept it's an ill chosen term) whereas for bitset<33> it's not the case.
Moreover for similar reasons the code for the set method on a x86_64 for bitset<64> maybe quite different from the code generated for the set method on a 32bit machine.

 
Reply With Quote
 
 
 
 
Marc
Guest
Posts: n/a
 
      11-30-2012
Ninds wrote:

> I would like to know whether using bitset<32> for bit operations on a
> 32bit machine would generally be as efficient as using a 32bit int.
> Moreover, if the answer is yes would it also hold for bitset<64> and
> 64bit int on a 64 bit arch.
> I realise the standard says nothing about the implementation so there
> is no definitive answer but what is 'likely' to be the case ?


When bitset does the same thing as you would do with an integer (say
with operator| for instance), you can expect roughly the same code in
the end. But if you use .set(i), the standard requires that bitset test
whether i is in the range and throw otherwise. If you were not going to
do such a test with integers, that will make bitset slower.

If you are concerned about the generated code, you will have to look
through it for anything suspicious. Note that if you can't read asm,
with gcc, you can get a first idea of what optimizations happen looking
at the file produced by -fdump-tree-optimized (looks like C code).
 
Reply With Quote
 
Johannes Bauer
Guest
Posts: n/a
 
      12-02-2012
On 30.11.2012 19:24, Luca Risolia wrote:
> On 29/11/2012 19:40, Johannes Bauer wrote:
>> -Ofast doesn't change anything with g++ on 4.6.2, underlining the fact
>> that your "Just make sure to compile with optimizations on" was
>> ill-advised. Apparently older g++ versions don't inline this construct
>> for whatever reason.

>
> That is not true. A version of gcc older than yours does inlining when
> minimal optimizations are turned on:


Hmm, in any case, this shows that the OP should look closely, as it
apparently depends on some more factors -- I'm really curious why the
optimization is not happening on my system, actually. Since you're on
x86-32, I tried to compile with "-m32", but still I cannot get my g++ to
optimize the 'set' call. weird. Output of g++/uname at bottom.

Best regards,
Johannes

[~/tmp]: LC_ALL=C g++ -v
Using built-in specs.
COLLECT_GCC=/usr/x86_64-pc-linux-gnu/gcc-bin/4.6.2/g++
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-pc-linux-gnu/4.6.2/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with:
/opt/tmp-portage/portage/sys-devel/gcc-4.6.2/work/gcc-4.6.2/configure
--prefix=/usr --bindir=/usr/x86_64-pc-linux-gnu/gcc-bin/4.6.2
--includedir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.6.2/include
--datadir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.6.2
--mandir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.6.2/man
--infodir=/usr/share/gcc-data/x86_64-pc-linux-gnu/4.6.2/info
--with-gxx-include-dir=/usr/lib/gcc/x86_64-pc-linux-gnu/4.6.2/include/g++-v4
--host=x86_64-pc-linux-gnu --build=x86_64-pc-linux-gnu --disable-altivec
--disable-fixed-point --without-ppl --without-cloog --enable-lto
--enable-nls --without-included-gettext --with-system-zlib
--disable-werror --enable-secureplt --enable-multilib
--disable-libmudflap --disable-libssp --enable-libgomp
--with-python-dir=/share/gcc-data/x86_64-pc-linux-gnu/4.6.2/python
--enable-checking=release --enable-java-awt=gtk
--enable-languages=c,c++,java,fortran --enable-shared
--enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu
--enable-targets=all --with-bugurl=http://bugs.gentoo.org/
--with-pkgversion='Gentoo 4.6.2 p1.4, pie-0.5.0'
Thread model: posix
gcc version 4.6.2 (Gentoo 4.6.2 p1.4, pie-0.5.0)

[~/tmp]: uname -a
Linux joequad 3.3.7 #6 SMP PREEMPT Wed Nov 28 14:59:17 CET 2012 x86_64
Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz GenuineIntel GNU/Linux


--
>> Wo hattest Du das Beben nochmal GENAU vorhergesagt?

> Zumindest nicht öffentlich!

Ah, der neueste und bis heute genialste Streich unsere großen
Kosmologen: Die Geheim-Vorhersage.
- Karl Kaos über Rüdiger Thomas in dsa <hidbv3$om2$>
 
Reply With Quote
 
W Karas
Guest
Posts: n/a
 
      12-03-2012
On Sunday, December 2, 2012 7:37:44 AM UTC-5, Luca Risolia wrote:
> On 02/12/2012 09:31, Johannes Bauer wrote:
>
> > Hmm, in any case, this shows that the OP should look closely, as it

>
> > apparently depends on some more factors -- I'm really curious why the

>
> > optimization is not happening on my system, actually. Since you're on

>
> > x86-32, I tried to compile with "-m32", but still I cannot get my g++ to

>
> > optimize the 'set' call. weird. Output of g++/uname at bottom.

>
>
>
> In your g++ 4.6 try to increase -finline-limit to an appropriate value:
>
>
>
> :~$ g++-4.6 -O1 -c x.cc -o x
>
> :~$ objdump --demangle -d x | grep call | grep bitset | grep '::set'
>
> f: e8 00 00 00 00 callq 14 <std::bitset<64ul>::set(unsigned
>
> long, bool)+0x14>
>
>
>
> :~$ g++-4.6 -O1 -finline-limit=26 -c x.cc -o x
>
> :~$ objdump --demangle -d x | grep call | grep bitset | grep '::set'
>
>
>
> :~$ g++-4.6 -O1 -finline-limit=25 -c x.cc -o x
>
> :~$ objdump --demangle -d x | grep call | grep bitset | grep '::set'
>
> f: e8 00 00 00 00 callq 14 <std::bitset<64ul>::set(unsigned
>
> long, bool)+0x14>
>
> 17: e8 00 00 00 00 callq 1c <std::bitset<64ul>::set(unsigned
>
> long, bool)+0x1c>


I would get rid of this compile option. -O even at the minimum level should enable inlining. There should be an option that would specify the cumulative number of additional instructions that recursive inlining of a call could add before the recursion was stopped. The default for this should generally be zero, since bigger and nominally fewer instructions executed can be more than offset by a larger cache working set. A configurable limit on depth of inlining is desirable, but it should only apply when direct or indirect inlining of the same function is detected.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Vector and list iterators and efficiency Thormod Johansen C++ 2 03-26-2007 05:23 PM
Applets, Seperate Frames, and CPU cycles/Efficiency Russ Java 4 05-02-2005 11:07 PM
Opinions wanted regarding efficiency and drop down list data Wysiwyg ASP .Net 2 12-27-2004 11:16 PM
Huge SQL query and ASP.NET...Question of efficiency The Eeediot ASP .Net 3 11-16-2004 10:12 PM
Efficiency of XML and Transformations? GSK ASP .Net 0 05-17-2004 04:48 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57