ben wrote:
>
> well, it sounds like a good point. if you're right then it's a very
> good answer and one i'd definitely want. the exercise is just from a
> book. i don't have to do it as stated by the exercise. if the way
> that's detailed isn't so useful and there's a much more useful way
then
> i'd prefer to learn and do it a way that's useful. i do intend to use
> this stuff in the real world -- that's what i'm learning it for.
Good decision. Can't remember who said "In theory, theory and practice
are the same, in practice, they're different."
>
> i'm just wondering though, if you're comparing operation counts of
code
> compiled using the same settings, and the different programmes are
> written in a similar fashion (that is, for example, not comparing one
> programme that unrolls a loop and another that doesn't -- code
written
> in the same style) then the number of operations is a reasonable way
to
> compare? basically, use the same base settings and style. (although
> "code written in the same style" is probably pretty ambiguous)
These days, it would be unwise to manually unroll a loop in code
without timings to back it up. Compilers typically understand the
associated branching costs (whether cheap or expensive, cost of
mispredicted branches, etc) and pipeline scheduling better than most
people do.
>
> further thought: different operations take different amounts of time
> don't they? -- i think so. so that'd make operation counting less
> useful.
Yes, although it depends on the processor architecture exactly how
different the numbers can be. Some instructions can be executed in
parallel. Some instructions may need to be internally broken down into
smaller micro-ops, each of which is individually executed.
> for instance i think accessing a word of memory that isn't in
> the cpu's cache can take several clock cycles, whereas if that data
is
> in cache the access might take just one clock cycle. correct?
Worse, much of the time, unless you're dealing with older or embedded
processors. It can take hundreds or even thousands of cycles on newer
fast machines. Typically it's not one cycle from cache, either, and it
will depend on which cache it's coming from.
For the sake of illustration, let's revisit a previous topic-- loop
unrolling. Let's say you manually unrolled a bunch of loops all over
your code. Are you sure that this new bulk of code does not force some
other critical part of code out of cache, resulting in several hundred
cycle wastage every time it has to be re-fetched? Now the several
cycles you saved looks like nothing. Think of the compiler as your
friend that helps you find the sweet spot amongst many different
competing factors.
>
> so basically you're saying timing is the way to measure algorithms?
>
Absolutely. First, optimize algorithmic complexity. Second, implement
it cleanly. Third, play with your compiler settings/hints and profile
it. Fourth and last, which is rarely needed, micro-optimize.
The fastest linked list in the world will get creamed by a slow hash or
tree if it's used inappropriately. Compilers do a better job of
optimizing if the code is implemented cleanly. Profiling shows you
where the bulk of _time_ is spent, so you don't go chasing false leads
and trying to optimize things that will give you a .00001% increase.
Mark F. Haigh