In article <(E-Mail Removed)>,

<(E-Mail Removed)> wrote:

>On Feb 11, 1:44=A0am, (E-Mail Removed) wrote:

>> Maybe someone can explain the math in the (de)conversion?

>There's not really much math involved, you're just swapping around

>pointers.
There is some elementary math involved.

The original code had,

>>>unsigned char examplearray[] = {4, 3, 2, 1};

>>>unsigned int exampleint = *(unsigned int *)examplearray;
This presumes that 'unsigned int' is the same size as 4 unsigned

char, which is can also be expressed as sizeof(unsigned int) == 4.

An unsigned char is always at least 8 bits, so unsigned int in

this code is presumed to be at least 32 bits wide. This is a

non-portable assumption: 'unsigned long' should be used

instead of 'unsigned int', as unsigned long is guaranteed to

be at least 32 bits, but unsigned int might be as small as 16 bits.

It is possible in C to have a 32 bit unsigned int or unsigned long

and yet for sizeof(int) to not be 4: for example it is legal in

C for unsigned char itself to be 32 bits and sizeof(unsigned int) == 1.

Real systems with such characteristics exist -- and the code would

completely break on them.

When unsigned char examplearray[] = {4, 3, 2, 1}; then C guarantees

that the 4, 3, 2, 1 will be stored in memory in increasing address

order. If I use | to mark the end of bytes in increasing memory order,

examplearray would end up holding |4|3|2|1| in that order.

When (unsigned int *)examplearray is done (note I removed the

leading * from the expression), the resulting pointer will be

a pointer to unsigned int, and it will point to the beginning of

that memory area, |4|3|2|1| . The * in front of the pointer expression,

*(unsigned int *)examplearray "dereferences" that pointer, so

unsigned int exampleint will be an unsigned int loaded from memory

that was initialized to |4|3|2|1| .

Now this is the part that starts getting complicated: the -numeric-

significance of each byte of the |4|3|2|1| for the purposes of

unsigned int, is not necessarily going to be in the same order

as the bytes are written in memory.

On some systems ("big endian systems") the numeric order -would- be in

exactly that order, and the numeric value of the unsigned int would be

4 << (3*CHARBIT) + 3 << (2*CHARBIT) + 2 << (1*CHARBIT) + 1 << (0*CHARBIT)

where CHARBIT is the number of bits in a char (typically 8 but could

be more.) Using a non-C notation for a moment where ** represents

exponentiation, this would be

4 * CHARBIT**3 + 3 * CHARBIT**2 + 2 * CHARBIT**1 + 1 * CHARBIT**0

which is exactly parallel to traditional decimal (base 10) notation

in which the base 10 number 4321 means

4 * 10**3 + 3 * 10**2 + 2 * 10**1 + 1 * 10**0

However, there are other systems ("little endian") in which the

numeric order of the |4|3|2|1| bytes would be loaded from memory

completely differently. Two variations with "little endian"

systems would be

3 << (3*CHARBIT) + 4 << (2*CHARBIT) + 1 << (1*CHARBIT) + 2 << (0*CHARBIT)

and

2 << (3*CHARBIT) + 1 << (2*CHARBIT) + 4 << (1*CHARBIT) + 3 << (0*CHARBIT)

which could be respectively written (in non-C notation) as

3 * CHARBIT**3 + 4 * CHARBIT**2 + 1 * CHARBIT**1 + 2 * CHARBIT**0

and

2 * CHARBIT**3 + 1 * CHARBIT**2 + 4 * CHARBIT**1 + 3 * CHARBIT**0

which would have analogs in base 10 as if the byte stream |4|3|2|1|

loaded into memory as the decimal numbers 3412 or 2143 respectively.

These different ways of assigning relative numeric significance to

streams of bytes in memory are not wrong, they are just different,

and as long as the program is consistant about which order is used

there is no problem (except when talking to other systems that

use different orders.)

Pentium-type processors tend to use one of the little-endian

orderings; some processors such as MIPS R4000/R10000/R12000 etc.

use "big-endian" orderings. If you work with more than 2 distinct

processor architectures, you will probably encounter different

"endian" orderings at some point.

Now, when the process is reversed and the character array is

populated with the unsigned long value, the processor will take

the numeric value it has in the processor, and will write a sequence

of bytes into memory. The order that it does that writing in

need not be "most significant bit first" (that is, it need not be

the bit that denotes the highest numeric value that gets written

first). It could be -- "big endian" systems write in that order

for example. But lots of other systems write in some other order

(perhaps for some attempt to maintain compatability with

the original 8 bit processors in their family lines). Whatever

order the processor uses to write values to memory will be the

exact mirror of the order that it loads from memory with,

so if the numeric order that it picked up from loading |4|3|2|1|

into memory was

3 * CHARBIT**3 + 4 * CHARBIT**2 + 1 * CHARBIT**1 + 2 * CHARBIT**0

then whatever current value it has to deal with will be written

reflecting that value order, producing |4|3|2|1| in memory.

With this ordering, if the current value it had in memory was

141 * CHARBIT**3 + 17 * CHARBIT**2 + 92 * CHARBIT**1 + 29 * CHARBIT**0

then to maintain consistency with the loads, the bytes it would

write into memory would be |17|141|29|92| .

Now, no matter what order was used to determine numeric signficance upon

load, the storage will undo the effect for the same value,

so no matter what order your processor uses internally, loading

|4|3|2|1| from memory into an unsigned long and storing it again

is going to result in |4|3|2|1| (assuming that sizeof(unsigned long) == 4)

So the matter is more complex than just "manipulating pointers",

but the mathematics involved ends up cancelling itself out if you

load and then store the same value. If you had, for example, added

1 to the unsigned long and then stored the result back into memory,

you might have ended up with |4|3|2|2| or with |4|3|3|1| or with

|4|4|2|1| or with |5|3|2|1| and the mathematics involved would

help describe that. And if you were working with CHARBIT 8

and you had (say) |4|255|255|1| and were to add 1 to the unsigned long

storage of that, you would need the mathematics shown above to understand

the results you might get.

For any given number of bits per char, there are 24 different values

that |4|3|2|1| might get loaded as an unsigned long, depending upon

the processor. A few processors, such as the ARM, are able to use

different memory storage orderings depending on the state of a flag.

(The MIPS Rx000 processors can as well, but it is more typical to

hard-wire the order bit so that it is constant for any one MIPS

motherboard.)

--

This is a Usenet signature block. Please do not quote it when replying

to one of my postings.

http://en.wikipedia.org/wiki/Signature_block