On 3/16/2013 7:21 PM, glen herrmannsfeldt wrote:
> Eric Sosman <> wrote:
>
> (snip)
>
>>> for(i=1; i<N; i++)
>>> for(j=i; j>0; j--)
>>> {
>>> if (a[j-1] > a[j])
>>> swap(a[j-1], a[j]);
>>> else
>>> break;
>>> }
>
>> A simple change will speed this up significantly on most
>> machines. The problem is with the swap() operation: It will
>> exchange (for example) items [10] and [9], then [9] and [8],
>> then [8] and [7] -- six memory writes to slide the original
>> [10] element three places to the left. By holding the "moving"
>> element in a temporary throughout, you can avoid almost half
>> the writes:
>
> Yes, but the cache might also figure that out. Or it might not.
>
> It is usual to study sort algorithms based on comparisons
> and stores, but that might not accurately indicate the time
> on real hardware. But much sort research was done before caching
> was common in processors, and even so it isn't easy to take
> into account.
That's why nice academic treatises on the asymptotic behavior
of this or that algorithm are not as useful as their authors may
wish you'd think.
People count comparisons (or swaps) because it's easy, not
because it's relevant. As you say, it used to be relevant, maybe
twenty or twenty-five years ago -- but if you're using such counts
to make performance decisions today, well, you're in the wrong
century. "Memory is the new disk," says Cliff Click, pointing out
that cache misses have supplanted page faults as the performance
bugaboos of modern systems. As far as I know (which ain't all *that*
far), modern computer scientists have not yet figured out how to
treat the Memory Gotcha mathematically. Not satisfactorily, anyhow:
The computational models all start with "Consider a spherical cow."
Meanwhile, what's a practical programmer to do? You can write
N different sort routines and measure their speed on the target
hardware, but it'll turn out tomorrow's hardware favors a different
implementation than today's. That's a losing game: "It takes all
the running you can do, to keep in the same place." An approach
with less certainty but (perhaps) greater hope is to try to put the
least pressure on those parts of the system you imagine may be prone
to slowness under stress. In the case at hand, we can hope that a
cache (or caches, or write buffers, or magic) will deal with what
looks like a lot of memory writes, or we can make fewer of them to
begin with. Although there's no certainty, the latter approach seems
to my mind to be less likely to come a cropper.
--
Eric Sosman
d