Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Name to method?

Reply
Thread Tools

Name to method?

 
 
superheathen@yahoo.ca
Guest
Posts: n/a
 
      02-11-2008
Hi

I'm reading from a database that stores information as an integer
representing a char array of ints, it is created in the following
way:

unsigned char examplearray[] = {4, 3, 2, 1};
unsigned int exampleint = *(unsigned int *)examplearray;

and back again using:

unsigned char examplearray2[4];
*(unsigned int*)examplearray2 = exampleint;

it works, I just have no clue how it works. Does this technique have a
name so I can look into it?
 
Reply With Quote
 
 
 
 
Walter Roberson
Guest
Posts: n/a
 
      02-11-2008
In article <(E-Mail Removed)>,
<(E-Mail Removed)> wrote:

>I'm reading from a database that stores information as an integer
>representing a char array of ints, it is created in the following
>way:


>unsigned char examplearray[] = {4, 3, 2, 1};
>unsigned int exampleint = *(unsigned int *)examplearray;


>and back again using:


>unsigned char examplearray2[4];
>*(unsigned int*)examplearray2 = exampleint;


>it works, I just have no clue how it works. Does this technique have a
>name so I can look into it?


It is sometimes called "type punning".
--
"Pray do not take the pains / To set me right. /
In vain my faults ye quote; / I wrote as others wrote /
On Sunium's hight." -- Walter Savage Landor
 
Reply With Quote
 
 
 
 
superheathen@yahoo.ca
Guest
Posts: n/a
 
      02-11-2008
i'm still suck.

Maybe someone can explain the math in the (de)conversion?
 
Reply With Quote
 
ts.death.angel@gmail.com
Guest
Posts: n/a
 
      02-11-2008
On Feb 11, 1:44*am, (E-Mail Removed) wrote:
> i'm still suck.
>
> Maybe someone can explain the math in the (de)conversion?


There's not really much math involved, you're just swapping around
pointers. Here's a rewrite that may be easier to understand:
unsigned char examplearray[] = {4, 3, 2, 1};
unsigned int *pointerint = examplearray; /* pointerint points to
examplearray (which is 32-bits; the size of an int) */
unsigned int exampleint = *pointerint; /* sets new int exampleint to
what pointerint points to */
 
Reply With Quote
 
superheathen@yahoo.ca
Guest
Posts: n/a
 
      02-11-2008
On Feb 10, 4:59 pm, (E-Mail Removed) wrote:
> On Feb 11, 1:44 am, (E-Mail Removed) wrote:
>
> > i'm still suck.

>
> > Maybe someone can explain the math in the (de)conversion?

>
> There's not really much math involved, you're just swapping around
> pointers. Here's a rewrite that may be easier to understand:
> unsigned char examplearray[] = {4, 3, 2, 1};
> unsigned int *pointerint = examplearray; /* pointerint points to
> examplearray (which is 32-bits; the size of an int) */
> unsigned int exampleint = *pointerint; /* sets new int exampleint to
> what pointerint points to */

There's got to be a little, in the example I provided the integer
isn't a pointer or array. (though I tried using the example you
provided and got:
warning: initialization from incompatible pointer type) Using some
casting , it somehow takes the array and converts it to 16909060 (how
does it get this number?) and then using 16909060 is able to
reconstruct the array. To me it'd make more sense if it did use
pointers instead of the casting.

Sorry if I'm too dense to get what you're getting at.
 
Reply With Quote
 
superheathen@yahoo.ca
Guest
Posts: n/a
 
      02-11-2008
On Feb 10, 5:27 pm, (E-Mail Removed) wrote:
> On Feb 10, 4:59 pm, (E-Mail Removed) wrote:> On Feb 11, 1:44 am, (E-Mail Removed) wrote:
>
> > > i'm still suck.

>
> > > Maybe someone can explain the math in the (de)conversion?

>
> > There's not really much math involved, you're just swapping around
> > pointers. Here's a rewrite that may be easier to understand:
> > unsigned char examplearray[] = {4, 3, 2, 1};
> > unsigned int *pointerint = examplearray; /* pointerint points to
> > examplearray (which is 32-bits; the size of an int) */
> > unsigned int exampleint = *pointerint; /* sets new int exampleint to
> > what pointerint points to */

>
> There's got to be a little, in the example I provided the integer
> isn't a pointer or array. (though I tried using the example you
> provided and got:
> warning: initialization from incompatible pointer type) Using some
> casting , it somehow takes the array and converts it to 16909060 (how
> does it get this number?) and then using 16909060 is able to
> reconstruct the array. To me it'd make more sense if it did use
> pointers instead of the casting.
>
> Sorry if I'm too dense to get what you're getting at.


also, if it were some sort of memory location, wouldn't it be
subjected to change each compile, rendering it unable to read the
database?
 
Reply With Quote
 
Walter Roberson
Guest
Posts: n/a
 
      02-11-2008
In article <(E-Mail Removed)>,
<(E-Mail Removed)> wrote:
>On Feb 11, 1:44=A0am, (E-Mail Removed) wrote:


>> Maybe someone can explain the math in the (de)conversion?


>There's not really much math involved, you're just swapping around
>pointers.



There is some elementary math involved.

The original code had,

>>>unsigned char examplearray[] = {4, 3, 2, 1};
>>>unsigned int exampleint = *(unsigned int *)examplearray;


This presumes that 'unsigned int' is the same size as 4 unsigned
char, which is can also be expressed as sizeof(unsigned int) == 4.

An unsigned char is always at least 8 bits, so unsigned int in
this code is presumed to be at least 32 bits wide. This is a
non-portable assumption: 'unsigned long' should be used
instead of 'unsigned int', as unsigned long is guaranteed to
be at least 32 bits, but unsigned int might be as small as 16 bits.

It is possible in C to have a 32 bit unsigned int or unsigned long
and yet for sizeof(int) to not be 4: for example it is legal in
C for unsigned char itself to be 32 bits and sizeof(unsigned int) == 1.
Real systems with such characteristics exist -- and the code would
completely break on them.

When unsigned char examplearray[] = {4, 3, 2, 1}; then C guarantees
that the 4, 3, 2, 1 will be stored in memory in increasing address
order. If I use | to mark the end of bytes in increasing memory order,
examplearray would end up holding |4|3|2|1| in that order.

When (unsigned int *)examplearray is done (note I removed the
leading * from the expression), the resulting pointer will be
a pointer to unsigned int, and it will point to the beginning of
that memory area, |4|3|2|1| . The * in front of the pointer expression,
*(unsigned int *)examplearray "dereferences" that pointer, so
unsigned int exampleint will be an unsigned int loaded from memory
that was initialized to |4|3|2|1| .

Now this is the part that starts getting complicated: the -numeric-
significance of each byte of the |4|3|2|1| for the purposes of
unsigned int, is not necessarily going to be in the same order
as the bytes are written in memory.

On some systems ("big endian systems") the numeric order -would- be in
exactly that order, and the numeric value of the unsigned int would be
4 << (3*CHARBIT) + 3 << (2*CHARBIT) + 2 << (1*CHARBIT) + 1 << (0*CHARBIT)
where CHARBIT is the number of bits in a char (typically 8 but could
be more.) Using a non-C notation for a moment where ** represents
exponentiation, this would be
4 * CHARBIT**3 + 3 * CHARBIT**2 + 2 * CHARBIT**1 + 1 * CHARBIT**0
which is exactly parallel to traditional decimal (base 10) notation
in which the base 10 number 4321 means
4 * 10**3 + 3 * 10**2 + 2 * 10**1 + 1 * 10**0

However, there are other systems ("little endian") in which the
numeric order of the |4|3|2|1| bytes would be loaded from memory
completely differently. Two variations with "little endian"
systems would be

3 << (3*CHARBIT) + 4 << (2*CHARBIT) + 1 << (1*CHARBIT) + 2 << (0*CHARBIT)
and
2 << (3*CHARBIT) + 1 << (2*CHARBIT) + 4 << (1*CHARBIT) + 3 << (0*CHARBIT)

which could be respectively written (in non-C notation) as

3 * CHARBIT**3 + 4 * CHARBIT**2 + 1 * CHARBIT**1 + 2 * CHARBIT**0
and
2 * CHARBIT**3 + 1 * CHARBIT**2 + 4 * CHARBIT**1 + 3 * CHARBIT**0

which would have analogs in base 10 as if the byte stream |4|3|2|1|
loaded into memory as the decimal numbers 3412 or 2143 respectively.

These different ways of assigning relative numeric significance to
streams of bytes in memory are not wrong, they are just different,
and as long as the program is consistant about which order is used
there is no problem (except when talking to other systems that
use different orders.)

Pentium-type processors tend to use one of the little-endian
orderings; some processors such as MIPS R4000/R10000/R12000 etc.
use "big-endian" orderings. If you work with more than 2 distinct
processor architectures, you will probably encounter different
"endian" orderings at some point.


Now, when the process is reversed and the character array is
populated with the unsigned long value, the processor will take
the numeric value it has in the processor, and will write a sequence
of bytes into memory. The order that it does that writing in
need not be "most significant bit first" (that is, it need not be
the bit that denotes the highest numeric value that gets written
first). It could be -- "big endian" systems write in that order
for example. But lots of other systems write in some other order
(perhaps for some attempt to maintain compatability with
the original 8 bit processors in their family lines). Whatever
order the processor uses to write values to memory will be the
exact mirror of the order that it loads from memory with,
so if the numeric order that it picked up from loading |4|3|2|1|
into memory was
3 * CHARBIT**3 + 4 * CHARBIT**2 + 1 * CHARBIT**1 + 2 * CHARBIT**0
then whatever current value it has to deal with will be written
reflecting that value order, producing |4|3|2|1| in memory.
With this ordering, if the current value it had in memory was
141 * CHARBIT**3 + 17 * CHARBIT**2 + 92 * CHARBIT**1 + 29 * CHARBIT**0
then to maintain consistency with the loads, the bytes it would
write into memory would be |17|141|29|92| .

Now, no matter what order was used to determine numeric signficance upon
load, the storage will undo the effect for the same value,
so no matter what order your processor uses internally, loading
|4|3|2|1| from memory into an unsigned long and storing it again
is going to result in |4|3|2|1| (assuming that sizeof(unsigned long) == 4)

So the matter is more complex than just "manipulating pointers",
but the mathematics involved ends up cancelling itself out if you
load and then store the same value. If you had, for example, added
1 to the unsigned long and then stored the result back into memory,
you might have ended up with |4|3|2|2| or with |4|3|3|1| or with
|4|4|2|1| or with |5|3|2|1| and the mathematics involved would
help describe that. And if you were working with CHARBIT 8
and you had (say) |4|255|255|1| and were to add 1 to the unsigned long
storage of that, you would need the mathematics shown above to understand
the results you might get.

For any given number of bits per char, there are 24 different values
that |4|3|2|1| might get loaded as an unsigned long, depending upon
the processor. A few processors, such as the ARM, are able to use
different memory storage orderings depending on the state of a flag.
(The MIPS Rx000 processors can as well, but it is more typical to
hard-wire the order bit so that it is constant for any one MIPS
motherboard.)
--
This is a Usenet signature block. Please do not quote it when replying
to one of my postings.
http://en.wikipedia.org/wiki/Signature_block
 
Reply With Quote
 
superheathen@yahoo.ca
Guest
Posts: n/a
 
      02-11-2008
On Feb 10, 6:45 pm, (E-Mail Removed)-cnrc.gc.ca (Walter Roberson)
wrote:
> In article <(E-Mail Removed)>,
>
> <(E-Mail Removed)> wrote:
> >On Feb 11, 1:44=A0am, (E-Mail Removed) wrote:
> >> Maybe someone can explain the math in the (de)conversion?

> >There's not really much math involved, you're just swapping around
> >pointers.

>
> There is some elementary math involved.
>
> The original code had,
>
> >>>unsigned char examplearray[] = {4, 3, 2, 1};
> >>>unsigned int exampleint = *(unsigned int *)examplearray;

>
> This presumes that 'unsigned int' is the same size as 4 unsigned
> char, which is can also be expressed as sizeof(unsigned int) == 4.
>
> An unsigned char is always at least 8 bits, so unsigned int in
> this code is presumed to be at least 32 bits wide. This is a
> non-portable assumption: 'unsigned long' should be used
> instead of 'unsigned int', as unsigned long is guaranteed to
> be at least 32 bits, but unsigned int might be as small as 16 bits.
>
> It is possible in C to have a 32 bit unsigned int or unsigned long
> and yet for sizeof(int) to not be 4: for example it is legal in
> C for unsigned char itself to be 32 bits and sizeof(unsigned int) == 1.
> Real systems with such characteristics exist -- and the code would
> completely break on them.
>
> When unsigned char examplearray[] = {4, 3, 2, 1}; then C guarantees
> that the 4, 3, 2, 1 will be stored in memory in increasing address
> order. If I use | to mark the end of bytes in increasing memory order,
> examplearray would end up holding |4|3|2|1| in that order.
>
> When (unsigned int *)examplearray is done (note I removed the
> leading * from the expression), the resulting pointer will be
> a pointer to unsigned int, and it will point to the beginning of
> that memory area, |4|3|2|1| . The * in front of the pointer expression,
> *(unsigned int *)examplearray "dereferences" that pointer, so
> unsigned int exampleint will be an unsigned int loaded from memory
> that was initialized to |4|3|2|1| .
>
> Now this is the part that starts getting complicated: the -numeric-
> significance of each byte of the |4|3|2|1| for the purposes of
> unsigned int, is not necessarily going to be in the same order
> as the bytes are written in memory.
>
> On some systems ("big endian systems") the numeric order -would- be in
> exactly that order, and the numeric value of the unsigned int would be
> 4 << (3*CHARBIT) + 3 << (2*CHARBIT) + 2 << (1*CHARBIT) + 1 << (0*CHARBIT)
> where CHARBIT is the number of bits in a char (typically 8 but could
> be more.) Using a non-C notation for a moment where ** represents
> exponentiation, this would be
> 4 * CHARBIT**3 + 3 * CHARBIT**2 + 2 * CHARBIT**1 + 1 * CHARBIT**0
> which is exactly parallel to traditional decimal (base 10) notation
> in which the base 10 number 4321 means
> 4 * 10**3 + 3 * 10**2 + 2 * 10**1 + 1 * 10**0
>
> However, there are other systems ("little endian") in which the
> numeric order of the |4|3|2|1| bytes would be loaded from memory
> completely differently. Two variations with "little endian"
> systems would be
>
> 3 << (3*CHARBIT) + 4 << (2*CHARBIT) + 1 << (1*CHARBIT) + 2 << (0*CHARBIT)
> and
> 2 << (3*CHARBIT) + 1 << (2*CHARBIT) + 4 << (1*CHARBIT) + 3 << (0*CHARBIT)
>
> which could be respectively written (in non-C notation) as
>
> 3 * CHARBIT**3 + 4 * CHARBIT**2 + 1 * CHARBIT**1 + 2 * CHARBIT**0
> and
> 2 * CHARBIT**3 + 1 * CHARBIT**2 + 4 * CHARBIT**1 + 3 * CHARBIT**0
>
> which would have analogs in base 10 as if the byte stream |4|3|2|1|
> loaded into memory as the decimal numbers 3412 or 2143 respectively.
>
> These different ways of assigning relative numeric significance to
> streams of bytes in memory are not wrong, they are just different,
> and as long as the program is consistant about which order is used
> there is no problem (except when talking to other systems that
> use different orders.)
>
> Pentium-type processors tend to use one of the little-endian
> orderings; some processors such as MIPS R4000/R10000/R12000 etc.
> use "big-endian" orderings. If you work with more than 2 distinct
> processor architectures, you will probably encounter different
> "endian" orderings at some point.
>
> Now, when the process is reversed and the character array is
> populated with the unsigned long value, the processor will take
> the numeric value it has in the processor, and will write a sequence
> of bytes into memory. The order that it does that writing in
> need not be "most significant bit first" (that is, it need not be
> the bit that denotes the highest numeric value that gets written
> first). It could be -- "big endian" systems write in that order
> for example. But lots of other systems write in some other order
> (perhaps for some attempt to maintain compatability with
> the original 8 bit processors in their family lines). Whatever
> order the processor uses to write values to memory will be the
> exact mirror of the order that it loads from memory with,
> so if the numeric order that it picked up from loading |4|3|2|1|
> into memory was
> 3 * CHARBIT**3 + 4 * CHARBIT**2 + 1 * CHARBIT**1 + 2 * CHARBIT**0
> then whatever current value it has to deal with will be written
> reflecting that value order, producing |4|3|2|1| in memory.
> With this ordering, if the current value it had in memory was
> 141 * CHARBIT**3 + 17 * CHARBIT**2 + 92 * CHARBIT**1 + 29 * CHARBIT**0
> then to maintain consistency with the loads, the bytes it would
> write into memory would be |17|141|29|92| .
>
> Now, no matter what order was used to determine numeric signficance upon
> load, the storage will undo the effect for the same value,
> so no matter what order your processor uses internally, loading
> |4|3|2|1| from memory into an unsigned long and storing it again
> is going to result in |4|3|2|1| (assuming that sizeof(unsigned long) == 4)
>
> So the matter is more complex than just "manipulating pointers",
> but the mathematics involved ends up cancelling itself out if you
> load and then store the same value. If you had, for example, added
> 1 to the unsigned long and then stored the result back into memory,
> you might have ended up with |4|3|2|2| or with |4|3|3|1| or with
> |4|4|2|1| or with |5|3|2|1| and the mathematics involved would
> help describe that. And if you were working with CHARBIT 8
> and you had (say) |4|255|255|1| and were to add 1 to the unsigned long
> storage of that, you would need the mathematics shown above to understand
> the results you might get.
>
> For any given number of bits per char, there are 24 different values
> that |4|3|2|1| might get loaded as an unsigned long, depending upon
> the processor. A few processors, such as the ARM, are able to use
> different memory storage orderings depending on the state of a flag.
> (The MIPS Rx000 processors can as well, but it is more typical to
> hard-wire the order bit so that it is constant for any one MIPS
> motherboard.)
> --
> This is a Usenet signature block. Please do not quote it when replying
> to one of my postings.http://en.wikipedia.org/wiki/Signature_block


excellent, appreciated tons.
 
Reply With Quote
 
Jack Klein
Guest
Posts: n/a
 
      02-11-2008
On Sun, 10 Feb 2008 16:02:36 -0800 (PST), http://www.velocityreviews.com/forums/(E-Mail Removed) wrote
in comp.lang.c:

> Hi
>
> I'm reading from a database that stores information as an integer
> representing a char array of ints, it is created in the following
> way:
>
> unsigned char examplearray[] = {4, 3, 2, 1};
> unsigned int exampleint = *(unsigned int *)examplearray;
>
> and back again using:
>
> unsigned char examplearray2[4];
> *(unsigned int*)examplearray2 = exampleint;
>
> it works, I just have no clue how it works. Does this technique have a
> name so I can look into it?


It might happen to "work" for your expectation of "work" on the
particular platform where you are using it. The C standard makes no
such guarantee, because the behavior is undefined. On some platforms,
if "examplearray" does not have the proper alignment, trying to access
it as an unsigned int will generate a hardware trap.

What you have here is an example of poorly written code by a
programmer who isn't anywhere near as knowledgeable as he/she thinks.

--
Jack Klein
Home: http://JK-Technology.Com
FAQs for
comp.lang.c http://c-faq.com/
comp.lang.c++ http://www.parashift.com/c++-faq-lite/
alt.comp.lang.learn.c-c++
http://www.club.cc.cmu.edu/~ajo/docs/FAQ-acllc.html
 
Reply With Quote
 
Kenny McCormack
Guest
Posts: n/a
 
      02-11-2008
In article <(E-Mail Removed)>,
Jack Klein <(E-Mail Removed)> blathered:
....
>What you have here is an example of poorly written code by a
>programmer who isn't anywhere near as knowledgeable as he/she thinks.


Yeah, that guy, Linus Torvalds, a real idiot. Probably lost his first
(and only) programming job - probably homeless and out on the street by now.

Yeah, I hear he used to do a lot of that sort of thing - type punning,
and god knows what else.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
adding a variable name to a hash to name is part of the variable name Bobby Chamness Perl 2 04-22-2007 09:54 PM
print("my name is {name}, and {age}-year old {gender}", name, age, gender); =?iso-8859-1?B?bW9vcJk=?= Java 7 01-02-2006 04:39 PM
IE name="name" & form.name property bug Java script Dude Javascript 5 06-30-2004 03:07 AM
name = name.substring(0, name.lastIndexOf('.')); Help please Jack-2 Javascript 3 12-24-2003 04:39 PM
Re: Urgent! how to get object name, method name and attribute name based on the strings? ding feng C++ 2 06-25-2003 01:18 PM



Advertisments