Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Java processors

Reply
Thread Tools

Java processors

 
 
Lew
Guest
Posts: n/a
 
      07-08-2012
BGB wrote:
> but, hell, with certain compilers (*cough* MSVC *cough*), it is an


I know. Kinda makes ya gag just to mention it, doesn't it?

> optimization setting mostly to have it cache the values of variables in
> registers (much less anything much more advanced that this).


That is rather oversimplifying the optimization options available.

> caching variables in registers and using basic peephole optimizations actually
> goes a long ways towards generating "optimized" compiler output.


What do you mean by the quotation marks?

How long is a "long ways" and compared to what?

> many other (potentially more significant) optimizations are higher-level, and
> don't necessarily actually make much of a difference at the lower-levels.


What do you mean by "higher-level" and "lower-levels [sic]"?

Of which particular optimizations do you speak?

HotSpot and other Java JIT compilers have an advantage over static optimizers
such as you describe - they can account for current run-time conditions.

For example, it might be that none but one thread are using a section of code
so all synchronization operations can be removed for a while.

Or perhaps there are no aliases extant for a given member variable, so it is
safe to enregister the value for a while, even though statically it would not
be safe.

HotSpot also will "unJIT" code - go back to the interpreted bytecode and drop
the machine-code compilation - when circumstances change.

--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedi.../c/cf/Friz.jpg


 
Reply With Quote
 
 
 
 
Lew
Guest
Posts: n/a
 
      07-08-2012
Roedy Green wrote:
>> That is just what JITs do. It is only after a while they have gathered
>> some stats to they decide which classes to turn to machine code. The
>> astounding thing is they stop the interpreter in mid flight executing
>> a method, and replace it with machine code and restart it. That to me
>> is far more impressive than walking on water.

>
> I wonder how long until hyperjits. They would burn a programmable
> gate array to handle the innermost loops once they saw it was
> justified. It would be optimised to the gate level.


How would these guys "unburn" the gates - reprogram them to a different
configuration?

--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedi.../c/cf/Friz.jpg


 
Reply With Quote
 
 
 
 
Peter J. Holzer
Guest
Posts: n/a
 
      07-08-2012
On 2012-07-07 02:16, Roedy Green <(E-Mail Removed)> wrote:
> On Sat, 07 Jul 2012 00:13:44 +0200, Silvio Bierman <(E-Mail Removed)>
> wrote, quoted or indirectly quoted someone who said :
>
>>I used to love working on a DEC-PDP11 running Unix. A great machine and
>>it had a Modula 2 compiler to boot.

[...]
> One of the most fun machines was the MINC with RT-11.

[...]
>
> Is DEC/VAX still around?


DEC replaced the VAX architecture (which was quite different from the
PDP11 architecture) with the Alpha architecture 20 years ago. DEC was
bought by Compaq a few years later which in turn was bought by HP.

There may still be VAXes running, though (I've seen one just a few years
ago).

hp


--
_ | Peter J. Holzer | Deprecating human carelessness and
|_|_) | Sysadmin WSR | ignorance has no successful track record.
| | | http://www.velocityreviews.com/forums/(E-Mail Removed) |
__/ | http://www.hjp.at/ | -- Bill Code on (E-Mail Removed)
 
Reply With Quote
 
Peter J. Holzer
Guest
Posts: n/a
 
      07-08-2012
On 2012-07-08 04:19, Gene Wirchenko <(E-Mail Removed)> wrote:
> On Sat, 07 Jul 2012 11:38:13 -0400, Eric Sosman
><(E-Mail Removed)> wrote:
>> The Wikipedia article on the VSE makes interesting reading. This
>>bit I found somewhat eyebrow-raising:
>>
>> The history of the exchange's index provides a standard case
>> example of large errors arising from seemingly innocuous
>> floating point calculations. [...] The accumulated truncations
>> led to an erroneous loss of around 25 points per month."
>>
>>Not enough RAM to retain the "unimportant" digits?

>
> Probably not. I would blame the use of FP.


Actually, it looks like they used a *fixed point* representation (3
decimal digits), not a floating point representation (although that
doesn't make much difference in this case since the magnitude stays
about the same). The obvious error is that they always truncated the
value, thus accumulating the error[1]. If they had rounded to nearest
most errors would have cancelled out. If they had used double precision
binary floating point numbers and not rounded to 3 decimal digits after
each trade the error would have been too small to notice (about 1 point
in 30 million years).

Anyway, that's a great example. I will use it in the next "binary
floating point is evil" discussion .

hp

[1] The second error was that they tried to compute the new value from
the old value and the change at all - that's pretty much guarantueed
to accumulate some error over time.


--
_ | Peter J. Holzer | Deprecating human carelessness and
|_|_) | Sysadmin WSR | ignorance has no successful track record.
| | | (E-Mail Removed) |
__/ | http://www.hjp.at/ | -- Bill Code on (E-Mail Removed)
 
Reply With Quote
 
BGB
Guest
Posts: n/a
 
      07-08-2012
On 7/8/2012 1:00 AM, Lew wrote:
> BGB wrote:
>> but, hell, with certain compilers (*cough* MSVC *cough*), it is an

>
> I know. Kinda makes ya gag just to mention it, doesn't it?
>


well, I actually use MSVC, but using MSVC and knowing ASM can make one
not exactly all that impressed with its ASM output.


>> optimization setting mostly to have it cache the values of variables in
>> registers (much less anything much more advanced that this).

>
> That is rather oversimplifying the optimization options available.
>


it is very minimal, but this is what you get with basic optimization
options with MSVC.

AFAICT, these ones are on by default (without explicit optimization
settings), with GCC.

it is also a little annoying that it can't use optimization and
profile/debug settings at the same time.


however, there are a few merits:
it has full access to the Windows API;
it has Visual Studio;
it can do .NET stuff;
....

my 3D engine is also mostly GPU-bound, so being compiled with debug
settings doesn't really the hurt overall performance too badly.


>> caching variables in registers and using basic peephole optimizations
>> actually
>> goes a long ways towards generating "optimized" compiler output.

>
> What do you mean by the quotation marks?
>


these are actually fairly naive optimizations, compared with what is
possible.

their relative effectiveness implies either:
many commonly used compilers are not very good on this front;
more advanced optimizations tend not to actually buy a whole lot (it
more amounting to a case of diminishing returns).


was going to give an example from another area, but it turned out to be
awkwardly long: basically, pointing out the cost/benefit tradeoffs which
lead to the present near-dominance of Huffman compression over
Arithmetic Coding and High-Order context modeling (PAQ / PPMd / ...),
despite Huffman not compressing nearly as well.

although, to be fair, many more recent codecs (LZMA, H.264, ...) use AC,
so things may be shifting slightly in its favor (the added compression
outweighing the higher time-cost).

but, yeah, there is often a lot more that could be done, except that the
costs may make it unreasonable or impractical to do so.


> How long is a "long ways" and compared to what?
>


0x40d0cc 1252 obuf[l0+0]=r0; obuf[l0+1]=g0; obuf[l0+2]=b0;
obuf[l0+3]=a; 0.68
0x40d0cc mov eax,[ebp-54h] 8B 45 AC 0.07
0x40d0cf add eax,[ebp-10h] 03 45 F0
0x40d0d2 mov cl,[ebp-78h] 8A 4D 88
0x40d0d5 mov [eax],cl 88 08 0.07
0x40d0d7 mov edx,[ebp-54h] 8B 55 AC 0.02
0x40d0da add edx,[ebp-10h] 03 55 F0
0x40d0dd mov al,[ebp-14h] 8A 45 EC 0.04
0x40d0e0 mov [edx+01h],al 88 42 01
0x40d0e3 mov ecx,[ebp-54h] 8B 4D AC 0.01
0x40d0e6 add ecx,[ebp-10h] 03 4D F0 0.08
0x40d0e9 mov dl,[ebp-08h] 8A 55 F8 0.02
0x40d0ec mov [ecx+02h],dl 88 51 02
0x40d0ef mov eax,[ebp-54h] 8B 45 AC 0.11
0x40d0f2 add eax,[ebp-10h] 03 45 F0 0.01
0x40d0f5 mov cl,[ebp-28h] 8A 4D D8 0.16
0x40d0f8 mov [eax+03h],cl 88 48 03 0.09

compiler = MSVC, source language = C.


it can actually get a lot worse than this, but it illustrates the basic
idea (without being too long).


for example, what if the compiler cached intermediate values in registers?
more likely the output above would look something more like:
mov eax,[ebp-54h]
add eax,[ebp-10h]
mov cl,[ebp-78h]
mov [eax],cl
mov dl,[ebp-14h]
mov [eax+01h],dl
mov cl,[ebp-08h]
mov [eax+02h],cl
mov dl,[ebp-28h]
mov [eax+03h],dl



>> many other (potentially more significant) optimizations are
>> higher-level, and
>> don't necessarily actually make much of a difference at the lower-levels.

>
> What do you mean by "higher-level" and "lower-levels [sic]"?
>
> Of which particular optimizations do you speak?
>


higher-level:
constant folding;
object lifetime analysis;
ability to skip out on certain safety checks;
scope visibility analysis and type-inference (mostly N/A to Java, more
relevant to languages like ECMASript);
....

lower-level:
register allocation strategies;
peephole optimization;
....


higher-level optimizations can be done usually in advance of generating
the output code, and they don't particularly depend on the type of
output being produced (target architecture, ...).

whereas things like register allocation depend much more on the target
architecture, and are more closely tied to the compiler output being
produced.


> HotSpot and other Java JIT compilers have an advantage over static
> optimizers such as you describe - they can account for current run-time
> conditions.
>
> For example, it might be that none but one thread are using a section of
> code so all synchronization operations can be removed for a while.
>
> Or perhaps there are no aliases extant for a given member variable, so
> it is safe to enregister the value for a while, even though statically
> it would not be safe.
>
> HotSpot also will "unJIT" code - go back to the interpreted bytecode and
> drop the machine-code compilation - when circumstances change.
>


I wasn't focusing solely on static compilers, as a lot of this applies
to JIT compilers as well.


yes, but the question would be how many of these would risk compromising
the ability of the VM to readily switch between the JIT output and bytecode.

very possibly, the JIT would be focusing more on optimizations which
would not hinder its own operation.

an example would be maintaining "sequential consistency", where
theoretically, an optimizer would alter the relative order in which
operations take place, or reorganize the control flow within a method, ...

although possible, this would hinder the ability to easily jump into or
out-of the JITed output code, so a JIT would likely refrain from doing
so (upholding the behavior that events take place in the native code in
the same relative order as they appear in the bytecode, ...).

very likely, the JIT would also arrange that the overall state is
consistent at points where it may jump into or out of the generated code
(all values properly stored in their respective variables, ...).
 
Reply With Quote
 
BGB
Guest
Posts: n/a
 
      07-08-2012
On 7/8/2012 8:40 AM, Wanja Gayk wrote:
> In article <jt5okh$53a$(E-Mail Removed)>, (E-Mail Removed)d
> says...
>
>> My colleague's point was that JITting the code, aggressively
>> or not, is pre-execution overhead: It is work spent on something
>> other than running your code.

>
> Thet's a pretty single threaded point of view.
>
>> If you just dove in and started
>> interpreting you might be running more slowly, but you'd have a
>> head start: Achilles is the faster runner, but cannot overcome
>> the tortoise's lead if the race is short.

>
> In fact optimiziation, profiling an compiling can be done while the code
> is being interpreted.
>


potentially, but some profiling activities will risk reducing
interpreter performance, since they have to be done inline with
interpreting the code (for example, updating counters, ...).

if the program is both multi-threaded and CPU bound, then the effects of
a JIT running in a different thread may be visible, but granted, it is
also fairly unlikely for many apps (many are not CPU-bound, and many of
which are CPU bound have most of their activity concentrated in 1 or 2
threads anyways).


>> I dunno: Are JIT's nowadays smart enough to recognize code
>> that will (future tense) execute too few times to be worth JITting?

>
> As far as I know it does take things like simple loop counters into
> account when comparing invocation counts to the compilation threshold,
> so it can compile more eagerly.
>


yep.

a simple strategy like "compile this code once it has been executed N
times" actually works reasonably well.

code executed less than this remains interpreted, and probably isn't
really a huge time-waster anyways, whereas code which is executed often
will more quickly trigger the JIT to "do its thing" (causing it to go
faster).

 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      07-08-2012
BGB wrote:
> Lew wrote:
>> HotSpot and other Java JIT compilers have an advantage over static
>> optimizers such as you describe - they can account for current run-time
>> conditions.
>>
>> For example, it might be that none but one thread are using a section of
>> code so all synchronization operations can be removed for a while.
>>
>> Or perhaps there are no aliases extant for a given member variable, so
>> it is safe to enregister the value for a while, even though statically
>> it would not be safe.
>>
>> HotSpot also will "unJIT" code - go back to the interpreted bytecode and
>> drop the machine-code compilation - when circumstances change.
>>

>
> I wasn't focusing solely on static compilers, as a lot of this applies to JIT
> compilers as well.
>
>
> yes, but the question would be how many of these would risk compromising the
> ability of the VM to readily switch between the JIT output and bytecode.


None of them.

> very possibly, the JIT would be focusing more on optimizations which would not
> hinder its own operation.


None of the optimizations it does hinder its own operation, so that's
trivially true.

> an example would be maintaining "sequential consistency", where theoretically,
> an optimizer would alter the relative order in which operations take place, or
> reorganize the control flow within a method, ...


You explained the other use of quotation marks quite well. What is the intent
of these here?

The optimizer does, indeed, alter the naive order of operations as it deems
helpful, according to what I've read.

It does so, when it does so, in a way that does not break the observed order
of events as mandated by the JLS.

> although possible, this would hinder the ability to easily jump into or out-of
> the JITed output code, so a JIT would likely refrain from doing so (upholding
> the behavior that events take place in the native code in the same relative
> order as they appear in the bytecode, ...).


The compiler avoids breaking the promise mandated by the JLS. It does not bind
itself further than that with respect to altering the order of events.

When you say "likely", how likely and on what are you basing your probability
estimate?

It is 100% certain that the optimizer doesn't break the promise of execution
order mandated by the JLS. It has nothing to do with jumping into or "out-of
[sic]" native code. It has to do with maintaining mandatory program semantics.

The optimizer cheerfully rearranges execution order in ways that do not
violate the promise, when it finds it advantageous to do so. I do not know how
to estimate the probability of that happening, except that documentation
claims that it does sometimes happen.

> very likely, the JIT would also arrange that the overall state is consistent
> at points where it may jump into or out of the generated code (all values
> properly stored in their respective variables, ...).


Again with the "very likely". How likely is that? Based on what evidence?

--
Lew


--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedi.../c/cf/Friz.jpg


 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      07-08-2012
On Sat, 07 Jul 2012 23:04:37 -0700, Lew <(E-Mail Removed)> wrote,
quoted or indirectly quoted someone who said :

>
>How would these guys "unburn" the gates - reprogram them to a different
>configuration?


they would not. Presumably these things would be so cheap that you
would just replace a bank that had too many pieces of abandoned code,
and repack them onto a fresh bank, like the way the Replicator repacks
zips when they get too full of junk. If there were sufficiently
dense, e.g. some sort of 3D array, you might just keep everything.

Also somebody might invent an erasable array, where you don't
literally burn, just program.

I hope somebody does this in the lab no matter how impractical just to
find out the order of magnitude efficiency improvement that could be
had.
--
Roedy Green Canadian Mind Products
http://mindprod.com
Why do so many operating systems refuse to define a standard
temporary file marking mechanism? It could be a reserved lead character
such as the ~ or a reserved extension such as .tmp.
It could be a file attribute bit. Because they refuse, there is no
fool-proof way to scan a disk for orphaned temporary files and delete them.
Further, you can't tell where the orhaned files ame from.
This means the hard disks gradually fill up with garbage.

 
Reply With Quote
 
Martin Gregorie
Guest
Posts: n/a
 
      07-08-2012
On Sat, 07 Jul 2012 23:00:13 -0700, Lew wrote:

> HotSpot also will "unJIT" code - go back to the interpreted bytecode and
> drop the machine-code compilation - when circumstances change.
>

In view of this, why are we assuming that the JIT stops a method in its
tracks, compiles a native version and continues execution in that from
the stopping point? Are we sure that it doesn't do something along the
lines of stopping the bytecode execution (as above) doing a JIT native
compilation and, when its finished, wrap both the bytecode version and
the native binary versions in a switch mechanism, set the version
selector to 'native' and then restart the bytecode version? Something
like this would sidestep the wherethefugarwi problem of stopping bytecode
and restarting in optimised native binary while making sure that the
*next* execution run the JITed native binary.



--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
 
Reply With Quote
 
Lew
Guest
Posts: n/a
 
      07-08-2012
Martin Gregorie wrote:
> Lew wrote:
>> HotSpot also will "unJIT" code - go back to the interpreted bytecode and
>> drop the machine-code compilation - when circumstances change.
>>

> In view of this, why are we assuming that the JIT stops a method in its
> tracks, compiles a native version and continues execution in that from
> the stopping point? Are we sure that it doesn't do something along the


"We"?

I, at least, make no such assumption.

> lines of stopping the bytecode execution (as above) doing a JIT native
> compilation and, when its finished, wrap both the bytecode version and
> the native binary versions in a switch mechanism, set the version
> selector to 'native' and then restart the bytecode version? Something
> like this would sidestep the wherethefugarwi problem of stopping bytecode
> and restarting in optimised native binary while making sure that the
> *next* execution run the JITed native binary.


As I understand the white papers, HotSpot, at least, does not stop the
interpreter while it's compiling, but does background compilation.

I don't have a reference handy just now, but HotSpot does do something like
the flip you describe, save for the "stopping bytecode" part, IIRC.

HotSpot is not a JIT compiler, as they go through some pains to emphasize.
It's an after-the-fact compiler.

<http://www.oracle.com/technetwork/java/whitepaper-135217.html#3>
"The compiler must not only be able to detect when these optimizations become
invalid due to dynamic loading, but also be able to undo or redo those
optimizations during program execution, even if they involve active methods on
the stack. This must be done without compromising or impacting Java
technology-based program execution semantics in any way."

"[T]he Java HotSpot VM immediately runs the program using an interpreter, and
analyzes the code as it runs to detect the critical hot spots in the program.
Then it focuses the attention of a global native-code optimizer on the hot spots."

"Dynamic Deoptimization"
<http://www.oracle.com/technetwork/java/whitepaper-135217.html#dynamic>

"Compiler Optimizations"
<http://www.oracle.com/technetwork/java/whitepaper-135217.html#optimizations>

"... the Server VM performs extensive profiling of the program in the
interpreter before compiling the Java bytecode to optimized machine code. This
profiling data provides even more information to the compiler about data types
in use, hot paths through the code, and other properties. The compiler uses
this information to more aggressively and optimistically optimize the code in
certain situations. If one of the assumed properties of the code is violated
at run time, the code is deoptimized and later recompiled and reoptimized."

--
Lew
Honi soit qui mal y pense.
http://upload.wikimedia.org/wikipedi.../c/cf/Friz.jpg


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
looking for performance statistics (native JAVA processors) noone Java 1 03-05-2006 08:04 PM
Java Application using Multi-Processors rprabhakaran@gmail.com Java 2 02-11-2006 04:30 AM
Java slow on Xeon processors ? Michael Kreitmann Java 11 05-25-2004 03:25 AM
Java servlets and multiple processors Bura Tino Java 7 04-16-2004 07:51 AM
Java & Intel Xeon Processors Mike Java 2 10-22-2003 07:40 PM



Advertisments