I'm back again

Sorry for the delay, but I was involved in another project and I
haven't found time to make the measurements you asked for.
On 25 Gen, 03:52, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> > You are saying that code A could be faster than code B on a platform
> > (CPU+compiler), but could be slower on another platform.
>
> * * *Yes. *If you believe A is always faster/slower and B is always
> slower/faster, regardless of platform, compiler version, compiler
> options, program context, and phase of the Moon, you are too young
> and inexperienced for the work you're engaged in.
Ok, ok...
> * * *It is surpassingly unlikely that a straight-line source-code
> sequence A will be faster than B in all circumstances. *Compilers
> are startlingly subtle, but are not magical. *So, what's a poor
> programmer to do? *There are several approaches, among them:
> [...]
> * * *... and that is the LAST you will hear from me on this thread
> until and unless you offer some actual MEASUREMENTS! *If you say
> "I think" or "It stands to reason" or "Everyone knows," I will
> personally come after you with a big book of six-place logarithms
> and bludgeon you so hard your mantissa will fall off.
I made some measurements with an oscilloscope. The function is:
---
unsigned char data[] = {0x11, 0x22, 0x33, 0x44};
unsigned int x;
void foo(void) {
PIN_SET(34);
x = *(unsigned int *)&data[0];
PIN_RESET(34);
NOP();
PIN_SET(34);
x = *(unsigned int *)&data[1];
PIN_RESET(34);
NOP();
PIN_SET(34);
x = (data[0] <<

+ data[1];
PIN_RESET(34);
NOP();
PIN_SET(34);
x = (data[1] <<

+ data[2];
PIN_RESET(34);
NOP();
}
---
The first pulse on pin 34 is for an aligned access. The second pulse
is for misaligned access. The third/forth pulse is for a portable
access (aligned and misaligned).
I found the following (clock cycle is 62.5ns):
1st pulse: 1.13us (18 cycles)
2nd pulse: 1.25us (20 cycles)
3rd pulse: 1.63us (26 cycles)
4th pulse: 1.63us (26 cycles)
As imagined before, portable access (3rd and 4th pulses) is slower
than simple cast (but not portable) instructions. Of course, the
difference is only 6 cycles (375ns).
Anyway I think you are write: the gain with the cast/faster approach
is limited, compared with the advantages of a portable (byte
composition) approach.