On Mar 23, 12:00 pm, Razii <DONTwhatever...@hotmail.com> wrote:
> On Sun, 23 Mar 2008 15:45:21 GMT, red floyd <no.s...@here.dude> wrote:
> >Well, then, you can't make sweeping statements about C++ vs. Java. You
> >can only make statements about *a particular implementation* of C++ vs.
> >Java.
>
> I used the proper flag /O2 in vc++. Also, when you are deploying a
> commercial software, you will have to use flags that target the
> least-common-denominator processor. That's a disadvantage of C++ vs
> JIT language.
There are three things that are not correct about this:
1) The -ffast-math flag to GCC does not use any special processor
instructions. It simply removes certain assumptions, which is safe
when you know they aren't true. This is the flag with the major
performance increase. Do not confuse "targeting the least-common-
denominator processor" with "not using an ideal compiler for the
application". Whether you compile your commercial code with GCC or VC+
+ is irrelevant to what machine you are targetting. If your goal is to
target the least-common-denominator, then you would leave off, for
example, the SSE optimizations. However, in my example, it was -ffast-
math that was responsible for the largest speedup. Target platform is
not relevant.
2) It's more of a consequence of only distributing the least-common-
denominator than of using C++. That's a choice you can make. It would
be completely reasonable, for example, for a vendor of high-
performance research applications to distribute SSE-optimized builds.
E.g. Folding@Home (non-commercial,
http://folding.stanford.edu)
maintains a GPU-optimized version of their software as well as a
"normal" one; and their non-GPU client checks for various processor
extensions at run-time (such as SSE) and uses optimized code
accordingly. This is the same with 32- and 64-bit applications. This
is also common in OpenGL application development; where applications
can look for specific OpenGL features and take advantage of them,
rather than always only targeting the least-common-denominator of
graphics hardware. The same thing applies to Java wrt OpenGL.
Depending on how you *want* to look at it, an Intel machine with SSE
support is just as different from an Intel machine without SSE as it
is from a PowerPC with AltiVec (same with multicore machines). It just
so happens that you can easily target the least-common-denominator for
those Intel machines, but nothing stops you from releasing platform-
optimized builds. So it is not a disadvantage of C++ vs. anything; it
is a disadvantage of sticking to the last-common-denominator.
3) A JIT language has an extra layer of abstraction before the
hardware. This makes it difficult to perform these particular kinds of
optimizations, anyways. It's apples and oranges, the Java JIT compiler
is a completely different kind of beast than a native code compiler,
the optimizations available for a JIT compiler to make and the
approaches it can take to making them are too different from those of
a native compiler to make a valid comparison. They're just different
tools.
What you are talking about is a difference between targeting the least-
common-denominator of platforms, and targeting specific platforms --
which is an issue present no matter what language you are using. You
are not talking about a difference between any two specific languages.
Furthermore, what *I* was talking about is a difference between
different compilers, targeting the same platform (I am sorry my point
wasn't clear, I had meant to show that the compiler can generate very
fast code for you, in particular -ffast-math [which does not target
specific hardware features] on GCC moreso than the the architecture-
specific options).
> The JIT compiler knows what processor it is running on,
> and can generate code specifically for that processor.
True, and a valid point. However, depending on the nature of what you
are compiling, you easily could lose a lot of information about the
original intention of the code in the original optimizing pass that
limits how effective processor-specific optimizations made by the JIT
compiler can be.
For example (this is a very specific example but please extend it to
general cases): let's say you have some Java code that, I don't know,
multiplies the elements of two large arrays of floating-point values
together in a loop. You compile it to bytecode, and in the process
javac unrolls that loop to optimize a bit. Then you run it on a
platform that, say, provides a specific instruction for multiplying
large arrays of floating-point values together very quickly. If the
original compiler had targeted this platform to begin with, it could
have seen your access pattern and used that optimized instruction.
Instead it unrolled the loop, a least-common-denominator optimization.
Now the JIT compiler sees the unrolled loop, and because it does not
see the sequential access pattern that was very apparent in the
original code, it fails to analyze the unrolled loop and determine
that it should be "re-rolled" and use the optimized instruction
instead. Now this particular example might not be that great, I know,
because you could conceivably construct a JIT compiler that can
recognize and re-roll unrolled loops; but the general point is what I
said: you can potentially lose a lot of information by compiling code
that you can not recover, and this information could be critical to
performing the best optimizations off the bat.
Consequently, the JIT compiler must turn to other techniques for
optimization, such as caching compiled code for faster startup times,
etc. Again, it's a whole different beast.
> Thus, I won't
> use anything other than /O2 for c++... because, as I said, when you
> are deploying a commercial software, you will have to use flags that
> target the least-common-denominator processor anyway.
I addressed that above: Again, 1) some optimizations, like -ffast-math
or -funroll-loops (and be careful, of course, you can't blindly use
these, you have to make sure they make sense for your code), do not
target a specific processor, and 2) there is no rule that says you
have to target the least-common-denominator processor; and a heavy
duty research application, such as what James Kanze *originally*
cited, would most certainly take advantage of special features on the
target hardware.
Jason