Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > std::priority_queue: Abysmally bad performance versus std::set, can'tfigure out why.

Reply
Thread Tools

std::priority_queue: Abysmally bad performance versus std::set, can'tfigure out why.

 
 
Jeremy Murphy
Guest
Posts: n/a
 
      11-30-2012
Hi everyone,

I recently implemented A* for some research, and because I need both the pqinterface AND an iterable interface to the open list, I (somewhat naively)started using std::set. Now that the basic implementation is done, out ofcuriosity more than anything else, I decided to go back and try std:riority_queue for the special case problem where I don't need to iterate over the open list. (I also realized that the boost::heap pq implementations allow iterating, albeit with poor efficiency.)

To my surprise and horror, when I used std:riority_queue in place of std::set, the time and space performance is abysmally bad. So bad that I can'timagine why. For example, a problem of size n = 17 takes ~3 s and ~24 MB using std::set versus ~10 s and ~400 MB using std:riority_queue. At n = 20, the pq implementation actually tries to allocate > 3 GB of RAM and usually gets OOM-killed.

I am stumped as to what is going on, because the solver yields identical answers and debugging output for both implementations. I have tried both std::vector and std::deque as the underlying container.

Now here I have to apologize that my minimal working example is not very minimal at all. I tried to recreate the behaviour without all the A* business, but with no success. Here is a link to the library in question:

http://code.google.com/p/policy-based-search/

There are two example problems, of which the TSP is the only serious one and is where the problem is observable. The master branch is std::set, the pq_interface branch uses std:riority_queue. The TSP example requires one parameter, the problem size, e.g.: ./TSP 17.

I don't really expect people to have time to delve through my code, but if you do I would of course be incredibly humbled and appreciative. (I sincerely hope that you don't need to understand the TSP example itself and that just inspecting the library will suffice.) Even if you just have some ideas off the top of your head as to why std:riority_queue would behave like this, that would help. So that you don't have to go looking at the code tounderstand the basics, the queue element type is a 'search node' and is defined roughly as:

struct Node {
Node(std::vector<unsigned int> S, std::shared_ptr<Node> P) : state(S), parent(P) {}
std::vector<unsigned int> state;
std::shared_ptr<Node> parent;
};

and the queue is then defined:

typedef std::shared_ptr<Node> Element;
typedef std:riority_queue<Element, vector<Element>, Comp> PriorityQueue;

Roughly, the algorithm pushes a bunch of nodes (relative to problem size), pops one, repeats. Sorry about the length of my explanation, I tried to say only what I thought necessary, but that amounted to a lot. Thanks inadvance for any advice, small or large. Cheers.

Jeremy
 
Reply With Quote
 
 
 
 
Jeremy Murphy
Guest
Posts: n/a
 
      11-30-2012
So, I've done some quick analysis with the Massif tool in valgrind to compare std::set vs std:riority_queue in my A* implementation.

The tool really slows down processing, so I have only run it on a small problem, n = 13.
Both analysis results attribute the majority of heap usage to std::__shared_count<long specification with references to Node>. A second common heap element is a vector<unsigned int>. Both these elements somehow originate from shared_ptr_base.h, so no doubt account for Node allocation.
Then each graph has a unique heap element. The set graph has a heap element attributed to std::_Rb_tree_iterator<...>, which comes from stl_set.h, sois no doubt the overhead from using std::set. The pq graph has a unique section that is effectively just attributed to the search function, which increases in powers of 2 and I can only assume that is the std::vector of shared_ptrs in the pq.

(MiB) Total __shared_count<...> vector<uint> _Rb_tree Node
set 2.8 1.41 0.52 0.926 0.0
pq 3.0 1.85 0.69 0.0 0.512

So this is quite interesting for two reasons. Although the total difference is 200 KiB, the sum of differences between the common heap elements is ~457 KiB, which says that the pq implementation is storing more Nodes than the set, right? But the unique heap elements are smaller for the pq than theset, which agrees with theory and other tests that a std:riority_queue has less overhead than a std::set.

But wtf is going on?? In light of this, I just added some more debugging to track how many Nodes are allocated, and it is the same in each implementation. Argh! How can that be?

Jeremy
 
Reply With Quote
 
 
 
 
Öö Tiib
Guest
Posts: n/a
 
      11-30-2012
On Friday, 30 November 2012 04:40:03 UTC+2, Jeremy Murphy wrote:
> To my surprise and horror, when I used std:riority_queue in place of std::set, the time and space performance is abysmally bad. So bad that I can't imagine why.

....

> struct Node {
> Node(std::vector<unsigned int> S, std::shared_ptr<Node> P) : state(S), parent(P) {}
> std::vector<unsigned int> state;
> std::shared_ptr<Node> parent;
> };
>
> and the queue is then defined:
>
> typedef std::shared_ptr<Node> Element;
> typedef std:riority_queue<Element, vector<Element>, Comp> PriorityQueue;


I can't really say what is going on ... perhaps make bit smaller example that demos
the problem.

My experience is directly opposite: priority_queue washes the floors with sets,
intrusive_sets, maps, sorted lists etc in context where you always pop the best
and alternate pops and pushes rather rapidly. Also the vector on what priority_queue
adapts is way more memory efficient than std::set.

So ... it feels that you have defect somewhere but can't show where, sorry.
One note ... I tend to remove usage of shared_ptr in performance critical algorithms.
It is heavy and performs way worse than some index, iterator, unique_ptr or raw pointer.
 
Reply With Quote
 
red floyd
Guest
Posts: n/a
 
      11-30-2012
On 11/30/2012 1:08 PM, Öö Tiib wrote:
> On Friday, 30 November 2012 04:40:03 UTC+2, Jeremy Murphy wrote:
>> To my surprise and horror, when I used std:riority_queue in place of std::set, the time and space performance is abysmally bad. So bad that I can't imagine why.

> ...
>
>> struct Node {
>> Node(std::vector<unsigned int> S, std::shared_ptr<Node> P) : state(S), parent(P) {}
>> std::vector<unsigned int> state;
>> std::shared_ptr<Node> parent;
>> };
>>
>> and the queue is then defined:
>>
>> typedef std::shared_ptr<Node> Element;
>> typedef std:riority_queue<Element, vector<Element>, Comp> PriorityQueue;

>
> I can't really say what is going on ... perhaps make bit smaller example that demos
> the problem.
>
> My experience is directly opposite: priority_queue washes the floors with sets,
> intrusive_sets, maps, sorted lists etc in context where you always pop the best
> and alternate pops and pushes rather rapidly. Also the vector on what priority_queue
> adapts is way more memory efficient than std::set.
>
> So ... it feels that you have defect somewhere but can't show where, sorry.
> One note ... I tend to remove usage of shared_ptr in performance critical algorithms.
> It is heavy and performs way worse than some index, iterator, unique_ptr or raw pointer.
>

Maybe OP's queue is reallocating the vector as he adds to the
priority queue, while std::set is node based, so that the memory
usage is higher for priority queue.
 
Reply With Quote
 
Öö Tiib
Guest
Posts: n/a
 
      12-01-2012
On Saturday, 1 December 2012 00:25:26 UTC+2, red floyd wrote:
> On 11/30/2012 1:08 PM, Öö Tiib wrote:
> > On Friday, 30 November 2012 04:40:03 UTC+2, Jeremy Murphy wrote:
> >> To my surprise and horror, when I used std:riority_queue in place ofstd::set, the time and space performance is abysmally bad. So bad that I can't imagine why.

> > ...
> >
> >> struct Node {
> >> Node(std::vector<unsigned int> S, std::shared_ptr<Node> P) : state(S), parent(P) {}
> >> std::vector<unsigned int> state;
> >> std::shared_ptr<Node> parent;
> >> };
> >>
> >> and the queue is then defined:
> >>
> >> typedef std::shared_ptr<Node> Element;
> >> typedef std:riority_queue<Element, vector<Element>, Comp> PriorityQueue;

> >
> > I can't really say what is going on ... perhaps make bit smaller example that demos
> > the problem.
> >
> > My experience is directly opposite: priority_queue washes the floors with sets,
> > intrusive_sets, maps, sorted lists etc in context where you always pop the best
> > and alternate pops and pushes rather rapidly. Also the vector on what priority_queue
> > adapts is way more memory efficient than std::set.
> >
> > So ... it feels that you have defect somewhere but can't show where, sorry.
> > One note ... I tend to remove usage of shared_ptr in performance critical algorithms.
> > It is heavy and performs way worse than some index, iterator, unique_ptr or raw pointer.
> >

> Maybe OP's queue is reallocating the vector as he adds to the
> priority queue, while std::set is node based, so that the memory
> usage is higher for priority queue.


Hmm. Indeed. It is easy to find out if that is the problem by switching
to underlying deque:

typedef std:riority_queue<Element, deque<Element>, Comp> PriorityQueue;

If that improves performance then the problem is that memory is getting
rather madly fragmented by the alorithm. Fortunately there are always
ways to do things lot less dynamically.
 
Reply With Quote
 
Jeremy Murphy
Guest
Posts: n/a
 
      12-13-2012
Thanks everyone for their replies, and sorry about the long delay in my reply.
Like most baffling programming problems, the issue did turn out to be my own stupidity: I was using a non-total comparison function that works as expected in a std:riority_queue but not in a std::set (due to the enforcement of uniqueness).

Thanks for your help, cheers.

Jeremy
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Mozilla versus IE versus Opera versus Safari Peter Potamus the Purple Hippo Firefox 0 05-08-2008 12:56 PM
equal? versus eql? versus == versus === verus <=> Paul Butcher Ruby 12 11-28-2007 06:06 AM
ActiveX apologetic Larry Seltzer... "Sun paid for malicious ActiveX code, and Firefox is bad, bad bad baad. please use ActiveX, it's secure and nice!" (ok, the last part is irony on my part) fernando.cassia@gmail.com Java 0 04-16-2005 10:05 PM
24 Season 3 Bad Bad Bad (Spoiler) nospam@nospam.com DVD Video 12 02-23-2005 03:28 AM
Web Form Performance Versus Single File Performance jm ASP .Net 1 12-12-2003 11:14 PM



Advertisments