Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > ANSI C problem on P4 under Linux & Windows

Reply
Thread Tools

ANSI C problem on P4 under Linux & Windows

 
 
VNG
Guest
Posts: n/a
 
      08-22-2004
I have an ANSI C program that was compiled under Windows MSVC++ 6.0 (SP6) and
under Linux gnu, and ran under P3, P4 and AMD.

It runs fine on P3 and AMD under both Windows and Linux, but under P4 it has
problems. Under Windows 3GHz P4 runs twice slower than 800MHz P3... and under
Linux not only that it runs slower (while AMD is 40 times faster), but it also
produces wrong numerical results...

Any suggestion what can be the problem?

How to fix the P4 speed under MSVC++ (SP6)?
How to fix P4's speed and numerical result under Linux?

Here's some more details about the compilation:
GNU:
CFLAGS=-O6 -fexpensive-optimizations -ffast-math -fno-strength-reduce
-funroll-loops -fomit-frame-pointer -Wno-long-long -Wno-unused


Basically one of the most intensive loops (that we suspect in but aren't sure if
it causes the problem) looks like this:

static long loop_order;

void functionname ()
{
register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;
register long j;
:
{
register float c1, c2;
j = loop_order;
while (j--)
{
acc = *itPtr-- * c1;
acc += *itPtr-- * c2;
acc += *itPtr++ * c3;
*cPtr++ += *iPtr1++ * acc;
}
}
:
}

We have tried to eliminate the use of the word "register" and redefined "j" as
volatile, no change.


Thanks,
-- VNG








 
Reply With Quote
 
 
 
 
SM Ryan
Guest
Posts: n/a
 
      08-22-2004
# {
# register float c1, c2;
# j = loop_order;
# while (j--)
# {
# acc = *itPtr-- * c1;
# acc += *itPtr-- * c2;
# acc += *itPtr++ * c3;
# *cPtr++ += *iPtr1++ * acc;
# }
# }

Is there some reason to keep loading itPtr[-1] and itPtr[-2]
inside the loop instead of outside?

--
SM Ryan http://www.rawbw.com/~wyrmwif/
One of the drawbacks of being a martyr is that you have to die.
 
Reply With Quote
 
 
 
 
Profetas
Guest
Posts: n/a
 
      08-22-2004
which OS do you have in your P3?

newer OS/compiler may not use the register to store your vars which will
be slower

 
Reply With Quote
 
Jens.Toerring@physik.fu-berlin.de
Guest
Posts: n/a
 
      08-22-2004
Profetas <(E-Mail Removed)> wrote:
> which OS do you have in your P3?


Did you ever read the post? The OP writes it all at the start of his
article.

> newer OS/compiler may not use the register to store your vars which will
> be slower


That's simply BS. First of all, 'register' was never more than a
hint to the compiler that a variable will be used a lot and that
it might be a good idea to store it in a register. But the compiler
was always free to disregard this hint. Moreover, newer compilers
are usually quite good at figuring out such things, so you usually
don't need the 'register' keyword anymore because the compiler will
automatically pick the most suitable variables for keeping them in
registers. And, finally, this didn't got anything at all to do with
the OS.
Regards, Jens
--
\ Jens Thoms Toerring ___ http://www.velocityreviews.com/forums/(E-Mail Removed)-berlin.de
\__________________________ http://www.toerring.de
 
Reply With Quote
 
Jens.Toerring@physik.fu-berlin.de
Guest
Posts: n/a
 
      08-22-2004
VNG <(E-Mail Removed)> wrote:
> I have an ANSI C program that was compiled under Windows MSVC++ 6.0 (SP6) and
> under Linux gnu, and ran under P3, P4 and AMD.


> It runs fine on P3 and AMD under both Windows and Linux, but under P4 it has
> problems. Under Windows 3GHz P4 runs twice slower than 800MHz P3... and under
> Linux not only that it runs slower (while AMD is 40 times faster), but it also
> produces wrong numerical results...


> Any suggestion what can be the problem?


> How to fix the P4 speed under MSVC++ (SP6)?
> How to fix P4's speed and numerical result under Linux?


> Here's some more details about the compilation:
> GNU:
> CFLAGS=-O6 -fexpensive-optimizations -ffast-math -fno-strength-reduce
> -funroll-loops -fomit-frame-pointer -Wno-long-long -Wno-unused


No idea about the speed issues - and that's rather off-topic here,
because it's about the behavior of certain compilers in combination
with certain processors, which all hasn't much to do with C. And
about the wrong results with gcc have another look at the info
pages concerning the -ffast-math option:

> This option should never be turned on by any `-O' option since it
> can result in incorrect output for programs which depend on an
> exact implementation of IEEE or ISO rules/specifications for math
> functions.


Perhaps it got to do something with this...

In your place I would probably start with throwing out all that
options and test carefully which of them really make a difference
- some of them could even result in a slow-down when used with the
wrong processor type. And your code is actually that obfuscated (and
not the one you're using, by the way) that a compiler might have
problems finding out how to optimize on it. Try to rewrite it in an
understandable form and you might have a much better chance to get
it optimized. If you then find it's too slow you still can try to
micro-optimize (but expect the effect to differ between compilers
and processors).

> Basically one of the most intensive loops (that we suspect in but aren't sure if
> it causes the problem) looks like this:


Profiling your code would probably be better than just guessing...

> static long loop_order;


> void functionname ()
> {
> register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;


iPtr is twice defined, that should get the compiler quite a bit upset.

> register long j;
> :


What's that colon good for?

> {


Why wrap this in another block?

> register float c1, c2;


Where do c1 and c2 ever get assigned values?

> j = loop_order;
> while (j--)
> {
> acc = *itPtr-- * c1;


iPtr has never been assigned a value.

> acc += *itPtr-- * c2;
> acc += *itPtr++ * c3;


c3 is never defined anywhere.

> *cPtr++ += *iPtr1++ * acc;


cPtr and iPtr1 also didn't get assigned values.

> }
> }


Now, what the hell is all that supposed to do?

Regards, Jens
--
\ Jens Thoms Toerring ___ (E-Mail Removed)-berlin.de
\__________________________ http://www.toerring.de
 
Reply With Quote
 
CBFalconer
Guest
Posts: n/a
 
      08-22-2004
VNG wrote:
>

.... snip about systems - OT ...
>
> Basically one of the most intensive loops (that we suspect in but
> aren't sure if it causes the problem) looks like this:
>
> static long loop_order;
>
> void functionname ()
> {
> register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;
> register long j;
> :
> {
> register float c1, c2;
> j = loop_order;
> while (j--)
> {
> acc = *itPtr-- * c1;
> acc += *itPtr-- * c2;
> acc += *itPtr++ * c3;
> *cPtr++ += *iPtr1++ * acc;
> }
> }
> :
> }
>
> We have tried to eliminate the use of the word "register" and
> redefined "j" as volatile, no change.


What are those isolated colons doing? The register keyword seems
pointless, as does the volatile. Initializing the various
pointers might help. Same for the cNs. c3 seems to be undefined.
The time for multiplication can vary greatly with the operands.

As ever, first measure. It should not be any great effort to do
some profiling runs.

--
fix (vb.): 1. to paper over, obscure, hide from public view; 2.
to work around, in a way that produces unintended consequences
that are worse than the original problem. Usage: "Windows ME
fixes many of the shortcomings of Windows 98 SE". - Hutchison


 
Reply With Quote
 
Christian Bau
Guest
Posts: n/a
 
      08-22-2004
In article <5nWVc.5582$(E-Mail Removed)>,
VNG <(E-Mail Removed)> wrote:

> I have an ANSI C program that was compiled under Windows MSVC++ 6.0 (SP6) and
> under Linux gnu, and ran under P3, P4 and AMD.
>
> It runs fine on P3 and AMD under both Windows and Linux, but under P4 it has
> problems. Under Windows 3GHz P4 runs twice slower than 800MHz P3... and
> under
> Linux not only that it runs slower (while AMD is 40 times faster), but it
> also
> produces wrong numerical results...
>
> Any suggestion what can be the problem?
>
> How to fix the P4 speed under MSVC++ (SP6)?
> How to fix P4's speed and numerical result under Linux?
>
> Here's some more details about the compilation:
> GNU:
> CFLAGS=-O6 -fexpensive-optimizations -ffast-math -fno-strength-reduce
> -funroll-loops -fomit-frame-pointer -Wno-long-long -Wno-unused
>
>
> Basically one of the most intensive loops (that we suspect in but aren't sure
> if
> it causes the problem) looks like this:
>
> static long loop_order;
>
> void functionname ()
> {
> register float *iPtr, *itPtr, *iPtr1, *cPtr, acc;
> register long j;
> :
> {
> register float c1, c2;
> j = loop_order;
> while (j--)
> {
> acc = *itPtr-- * c1;
> acc += *itPtr-- * c2;
> acc += *itPtr++ * c3;
> *cPtr++ += *iPtr1++ * acc;
> }
> }
> :
> }


P4s dislike accessing data at certain distances from each other. If the
distance between the various pointer variables is a multiple of a large
power of two (for example 64 KB) then you might be in trouble.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
pre-ansi to ansi c++ conversion? Frank Iannarilli C++ 2 07-21-2009 11:05 PM
Are there statistics packages in ANSI C and/or ANSI C++? lbrtchx@gmail.com C Programming 11 04-28-2008 03:00 AM
Are there statistics packages in ANSI C and/or ANSI C++? lbrtchx@gmail.com C++ 1 04-24-2008 06:44 PM
Java application developped under Linux running ridiculously slow under Windows hshdude Java 12 11-04-2004 05:49 PM
ANSI C problem on P4 under Linux & Windows VNG C++ 1 08-22-2004 07:55 AM



Advertisments