Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Binary tree search vs Binary search

Reply
Thread Tools

Binary tree search vs Binary search

 
 
Bogdan
Guest
Posts: n/a
 
      10-18-2010
Hi

I would like to check this result:
I have tested binary tree search API from Linux (search.h) by
searching into a binary tree generated from an unsorted dictionary a
given word. The second program uses the same dictionary, from which an
array is initialized which is sorted with qsort() then the same word
is searched with binary search fct (bsearch()).
Running both programs shows that the binary search (the second
program) is almost twice as fast as the binary tree search.
I have used directly the sorted distionary (so qsort() should have
the highest complexity), but still the second program is the fastest.
Is this the correct result ? What advantage has binary tree search
wrt qsort() followed by binary search ?

thanks
Bogdan
 
Reply With Quote
 
 
 
 
Tom St Denis
Guest
Posts: n/a
 
      10-18-2010
On Oct 18, 1:14*pm, Bogdan <(E-Mail Removed)> wrote:
> Hi
>
> * *I would like to check this result:
> I have tested binary tree search API from Linux (search.h) by
> searching into a binary tree generated from an unsorted dictionary a
> given word. The second program uses the same dictionary, from which an
> array is initialized which is sorted with qsort() then the same word
> is searched with binary search fct (bsearch()).
> * Running both programs shows that the binary search (the second
> program) is almost twice as fast as the binary tree search.
> * I have used directly the sorted distionary (so qsort() should have
> the highest complexity), but still the second program is the fastest.
> *Is this the correct result ? What advantage has binary tree search
> wrt qsort() followed by binary search ?


Binary search (e.g. ordered array) is O(log(n)) search but O(n)
insert. Binary tree is O(log(n)) search *and* O(log(n)) insert.

The overhead you're finding by a constant is acceptable (and falls
below the noise in big-Oh notation). They should in theory take
roughly the same time as n => oo, but for small values they'll skew.

Tom
 
Reply With Quote
 
 
 
 
James Dow Allen
Guest
Posts: n/a
 
      10-18-2010
On Oct 19, 12:43*am, Tom St Denis <(E-Mail Removed)> wrote:
> On Oct 18, 1:14*pm, Bogdan <(E-Mail Removed)> wrote:
> > * Running both programs shows that the binary search (the second
> > program) is almost twice as fast as the binary tree search.


> Binary search (e.g. ordered array) is O(log(n)) search but O(n)
> insert. *Binary tree is O(log(n)) search *and* O(log(n)) insert.


Correct.

> The overhead you're finding by a constant is acceptable


How in heaven's name do you know that?? I was once called
in to fix a poorly designed program that took 10 hours
to process data from an 8-hour shift. Customer was running
two shifts per day OK, but couldn't start a graveyard
shift till the software was sped up! Perhaps I should have
just told them "No problem! Some guy on Usenet says
this is 'acceptable'."

> (and falls below the noise in big-Oh notation).


Meaningless gibberish.

> *They should in theory take
> roughly the same time as n => oo, but for small values they'll skew.


Wrong again. They take respective approx. times of
a_1 + b_1 log N and a_2 + b_2 log N
What *is* correct is that you want to choose large N to
measure (b_1 / b_2) accurately. To assume a priori this ratio
will be unity shows much confusion.

Hope this helps.
James Dow Allen
 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      10-18-2010
Bogdan <(E-Mail Removed)> writes:

> I would like to check this result:
> I have tested binary tree search API from Linux (search.h) by
> searching into a binary tree generated from an unsorted dictionary a
> given word. The second program uses the same dictionary, from which an
> array is initialized which is sorted with qsort() then the same word
> is searched with binary search fct (bsearch()).
> Running both programs shows that the binary search (the second
> program) is almost twice as fast as the binary tree search.
> I have used directly the sorted distionary (so qsort() should have
> the highest complexity), but still the second program is the fastest.
> Is this the correct result ? What advantage has binary tree search
> wrt qsort() followed by binary search ?


I won't repeat what's been said already, but I will add that to get a
better understanding of what's happening you'll probably have to post
the code. It may be that matters unrelated to the abstract algorithms
such as IO or memory allocation are having a significant impact.

--
Ben.
 
Reply With Quote
 
robertwessel2@yahoo.com
Guest
Posts: n/a
 
      10-18-2010
On Oct 18, 12:14*pm, Bogdan <(E-Mail Removed)> wrote:
> Hi
>
> * *I would like to check this result:
> I have tested binary tree search API from Linux (search.h) by
> searching into a binary tree generated from an unsorted dictionary a
> given word. The second program uses the same dictionary, from which an
> array is initialized which is sorted with qsort() then the same word
> is searched with binary search fct (bsearch()).
> * Running both programs shows that the binary search (the second
> program) is almost twice as fast as the binary tree search.
> * I have used directly the sorted distionary (so qsort() should have
> the highest complexity), but still the second program is the fastest.
> *Is this the correct result ? What advantage has binary tree search
> wrt qsort() followed by binary search ?



In additional to what other have said, the pointer computation for
binary search may be a bit quicker than the pointer chasing you get
for a binary tree. The binary search may be a bit more cache friendly
as well. Although that's all implementation dependent, and there are
few absolutes.

A significant difference is that a binary tree can be unbalanced,
which may make certain leaf items considerably deeper in the tree than
the best case lg(n) depth. A worst case is often generated by
inserting items in the binary tree in ascending or descending order,
which may reduce the binary tree to a sequential linked list. There
are tree structures that remain balanced (FSVO "balance") during
insertion and deletion, but they tend to be more complex than a simple
binary tree.

 
Reply With Quote
 
Gene
Guest
Posts: n/a
 
      10-19-2010
On Oct 18, 1:14*pm, Bogdan <(E-Mail Removed)> wrote:
> Hi
>
> * *I would like to check this result:
> I have tested binary tree search API from Linux (search.h) by
> searching into a binary tree generated from an unsorted dictionary a
> given word. The second program uses the same dictionary, from which an
> array is initialized which is sorted with qsort() then the same word
> is searched with binary search fct (bsearch()).
> * Running both programs shows that the binary search (the second
> program) is almost twice as fast as the binary tree search.
> * I have used directly the sorted distionary (so qsort() should have
> the highest complexity), but still the second program is the fastest.
> *Is this the correct result ? What advantage has binary tree search
> wrt qsort() followed by binary search ?


As has been said, pre-qsorting is an average case O(n log n) operation
and then looking up O(n) entries with binary search will again require
O(n log n) time. Inserting O(n) items in e.g. a red-black or similar
balanced tree requires O(n log n) time, and looking up O(n) items is
again O(n log n). So asymptotically the solutions have comparable
asymptotic performance as long as the dictionary isn't modified once
sorted. If modifications can occur, the tree works better because
each update requires only O(log n) time. The array requires O(n) time
to update as has been explained.

Or of course if the tree implementation is not self-balancing, then
search length is a random phenomenon.

You said you're looking up "a given word." If you are looking up only
a single word, then chance difference between path length in the
balanced tree and number of probes in the binary search table alone
could account for a factor of 2.

Otherwise, it's that constant factors related to the implementations
are different. Inserting a string into a tree requires allocating a
node in addition to the string. There is space overhead for pointers
in the tree that isn't there in the sorted list. Extra space and the
way it falls in memory can affect cache and paging performance as has
been mentioned. Rebalancing the tree is overhead that isn't present
in the array, etc. A factor of 2 or 3 difference is not hard to
believe.

FWIW, I just happened to read this
http://ieeexplore.ieee.org/Xplore/lo...hDecision=-203
, a nicely done article about how merely realigning critical code
segments with respect to page / cache line boundaries (by e.g.
changing the size of the OS shell environment when starting the
program) can in some cases affect run time by 20%.

 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      10-19-2010
On 10/18/2010 9:44 PM, Gene wrote:
> On Oct 18, 1:14 pm, Bogdan<(E-Mail Removed)> wrote:
>> Hi
>>
>> I would like to check this result:
>> I have tested binary tree search API from Linux (search.h) by
>> searching into a binary tree generated from an unsorted dictionary a
>> given word. The second program uses the same dictionary, from which an
>> array is initialized which is sorted with qsort() then the same word
>> is searched with binary search fct (bsearch()).
>> Running both programs shows that the binary search (the second
>> program) is almost twice as fast as the binary tree search.
>> I have used directly the sorted distionary (so qsort() should have
>> the highest complexity), but still the second program is the fastest.
>> Is this the correct result ? What advantage has binary tree search
>> wrt qsort() followed by binary search ?

>
> As has been said, pre-qsorting is an average case O(n log n) operation
> and then looking up O(n) entries with binary search will again require
> O(n log n) time. Inserting O(n) items in e.g. a red-black or similar
> balanced tree requires O(n log n) time, and looking up O(n) items is
> again O(n log n). So asymptotically the solutions have comparable
> asymptotic performance as long as the dictionary isn't modified once
> sorted.


If by "comparable" you mean "essentially the same except for
negligible fiddly bits," you are repeating Tom St Denis' error: The
fact that two processes have the same order-of-magnitude asymptotic
performance does not mean they have "comparable" performance. For
example the time (or memory or comparisons or whatever) used two
processes A and B, both of O(1), can differ by an arbitrarily large
amount.

Brutally simplistic example: Take an O(N log N) sorting method S
that uses 100 nsec for each comparison and negligible time for other
operations. Add a 2.7-month delay loop to each comparison to arrive
at a new sorting method S2. Assertion: Both S and S2 have O(N log N)
performance, even though S can sort 1000 items in a millisecond while
S2 takes 2243 years. A millisecond is "comparable" to two-plus
millennia only for people who have way too much time on their hands.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)lid
 
Reply With Quote
 
Tom St Denis
Guest
Posts: n/a
 
      10-19-2010
On Oct 18, 3:04*pm, James Dow Allen <(E-Mail Removed)> wrote:
> > The overhead you're finding by a constant is acceptable

>
> How in heaven's name do you know that?? *I was once called
> in to fix a poorly designed program that took 10 hours
> to process data from an 8-hour shift. *Customer was running
> two shifts per day OK, but couldn't start a graveyard
> shift till the software was sped up! *Perhaps I should have
> just told them "No problem! *Some guy on Usenet says
> this is 'acceptable'." * * *


Well I didn't say it was optimal, I just said it's not a big deal. If
he's running into a timing hazard then it is a problem. See how that
works [read on...]

> > (and falls below the noise in big-Oh notation).

>
> Meaningless gibberish.


No, it's not. See the operation is log(n) time which is greater than
a constant multiple c. O(c*log(n)) is equiv to O(log(n)) therefore a
small constant factor difference is "below the noise" or irrelevant.

> > *They should in theory take
> > roughly the same time as n => oo, but for small values they'll skew.

>
> Wrong again. *They take respective approx. times of
> * * *a_1 + b_1 log N * and * a_2 + b_2 log N
> What *is* correct is that you want to choose large N to
> measure (b_1 / b_2) accurately. *To assume a priori this ratio
> will be unity shows much confusion.


You really don't understand big-Oh notation do you? ...

Tom
 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      10-19-2010
Tom St Denis <(E-Mail Removed)> writes:

> On Oct 18, 3:04*pm, James Dow Allen <(E-Mail Removed)> wrote:
>> Tom St Denis <(E-Mail Removed)> writes: [attribution restored]


<snip talking about two O(log n) algorithms>

>> > *They should in theory take
>> > roughly the same time as n => oo, but for small values they'll skew.

>>
>> Wrong again. *They take respective approx. times of
>> * * *a_1 + b_1 log N * and * a_2 + b_2 log N
>> What *is* correct is that you want to choose large N to
>> measure (b_1 / b_2) accurately. *To assume a priori this ratio
>> will be unity shows much confusion.

>
> You really don't understand big-Oh notation do you? ...


Which bit makes you think that?

Maybe you should define what you mean by "they should in theory take
roughly the same time as n => oo"?

James Dow Allen is making the reasonable assumption that this means

lim [n->oo] T1(n)/T2(n) = 1

where T1 and T2 are functions that describe the two algorithms running
times. How do you define "taking roughly the same time" so that it
covers the example given by James?

--
Ben.
 
Reply With Quote
 
James Dow Allen
Guest
Posts: n/a
 
      10-19-2010

On Oct 19, 12:43 am, Tom St Denis <(E-Mail Removed)> wrote:
> [OP reported on a speed comparison]:
>> [bsearch presents twice the speed of bin tree search]

> The overhead you're finding by a constant is acceptable (and falls
> below the noise in big-Oh notation). They should in theory take
> roughly the same time as n => oo, but for small values they'll skew.


And now, he tries to defend this foolishness!

First note that OP presents a speed-doubling and Tom implies
that it will go away for large n.
Tom: Do you still believe that, or did Googling "big-Oh" help?

Tom St Denis <(E-Mail Removed)> wrote, in
news:(E-Mail Removed):

> On Oct 18, 3:04*pm, James Dow Allen <(E-Mail Removed)> wrote:
>> > The overhead you're finding by a constant is acceptable


Tom, did your newsreader remove the attribution on the above
line? It's by Tom St Denis. If you now understand it was a foolish
remark, good for you. But don't remove the attribution, please.

>>
>> How in heaven's name do you know that?? ... Perhaps I should have
>> just told them "No problem! *Some guy on Usenet says
>> this is 'acceptable'." * * *

>
> Well I didn't say it was optimal, I just said it's not a big deal. If
> he's running into a timing hazard then it is a problem.


No. It's true that you don't say it's "optimal". But it's
false that you said "it's no big deal." What you wrote is
that it's "acceptable" and my question remains:
How do you know that???

If you meant it "might be acceptable" or "is probably fast enough"
you should have written that. I'm not a stickler for "dotting every i",
but in this case the way you phrased your answer was, to be blunt,
inane.

It's not unusual for such a search to dominate speed performance.
Speedup by a factor of two is, as a matter of fact, not unlikely to be
important! True, it's "often (or even usually) unimportant" but that's
not what you wrote, is it? BTW, since you're wrong on *every single
point* in your message here, I doubt you'll respond again, but if you
do, kindly have the courtesy to quote and correctly attribute your own
incorrect remarks which are under review.

>> > (and falls below the noise in big-Oh notation).

>>
>> Meaningless gibberish.

>
> No, it's not.


Yes it is. There is no "noise" in big-Oh notation for
it to "fall below." If this isn't clear, please present
an example which falls *above* your alleged "noise" in
big-Oh notation.

> O(c*log(n)) is equiv to O(log(n))


So you *did* know this much after all!
(Or did you Google for it just now?)


>> > *They should in theory take
>> > roughly the same time as n => oo, but for small values they'll
>> > skew.

>>
>> Wrong again. *They take respective approx. times of
>> * * *a_1 + b_1 log N * and * a_2 + b_2 log N
>> What *is* correct is that you want to choose large N to
>> measure (b_1 / b_2) accurately. *To assume a priori this ratio
>> will be unity shows much confusion.

>
> You really don't understand big-Oh notation do you? ...


. We don't mind when ignorant newbies post here.
Even when they act like insolent jerks, they're amusing.

James Dow Allen
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Hi, I want to implement a General Tree Data structure (Not Binary Tree ) which have more than 2 sub nodes? sharan C Programming 2 10-31-2007 02:58 AM
Hi, I want to implement a General Tree Data structure (Not Binary Tree ) which have more than 2 sub nodes? sharan C Programming 1 10-30-2007 11:01 PM
Hi, I want to implement a General Tree Data structure (Not Binary Tree ) which have more than 2 sub nodes? sharan C Programming 4 10-30-2007 08:21 PM
B+ Tree versus Ternary Search Tree Ramkumar Menon Java 2 08-16-2005 08:13 PM
B tree, B+ tree and B* tree Stub C Programming 3 11-12-2003 01:51 PM



Advertisments