Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > platform independent serialization of a long

Reply
Thread Tools

platform independent serialization of a long

 
 
RA Scheltema
Guest
Posts: n/a
 
      01-23-2004
hi all,


A small question about serializing and deserializing a long in a platform
independent manner. Can this be done with the following code ?:


char buf[4];
long val = 35456;

/* serialize ... on for example intel */
buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
buf[2] = (unsigned char) ((val & 0x0000ff00) >> ;
buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);

/* deserialize ... on for example mac */
val = 0;
val = val | ((unsigned long) buf[0]) << 24;
val = val | ((unsigned long) buf[1]) << 16;
val = val | ((unsigned long) buf[2]) << 8;
val = val | ((unsigned long) buf[3]) << 0;


According to a collegue of mine, the & (in the first part of the code)
ensures that the least significant and most significant byte is always
intact on whatever platform the buffer is deserialized. I don't agree, any
suggestions ?


kind regards,
richard


 
Reply With Quote
 
 
 
 
tom_usenet
Guest
Posts: n/a
 
      01-23-2004
On Fri, 23 Jan 2004 12:37:23 +0100, "RA Scheltema"
<r.a.scheltema[viral][s][p]@[m]dacolian.nl> wrote:

>hi all,
>
>
>A small question about serializing and deserializing a long in a platform
>independent manner. Can this be done with the following code ?:


No, the code assumes that sizeof(long) == 4 (not true on some 64-bit
platforms) and that CHAR_BIT == 8 (not true on some other platforms)
and that all platforms store negative numbers in the same way (not
true on 1s complement platforms, etc.), and use all bits in the value
representation of long.

>char buf[4];
>long val = 35456;
>
>/* serialize ... on for example intel */
>buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
>buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
>buf[2] = (unsigned char) ((val & 0x0000ff00) >> ;
>buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);
>
>/* deserialize ... on for example mac */
>val = 0;
>val = val | ((unsigned long) buf[0]) << 24;
>val = val | ((unsigned long) buf[1]) << 16;
>val = val | ((unsigned long) buf[2]) << 8;
>val = val | ((unsigned long) buf[3]) << 0;
>
>
>According to a collegue of mine, the & (in the first part of the code)
>ensures that the least significant and most significant byte is always
>intact on whatever platform the buffer is deserialized. I don't agree, any
>suggestions ?


Your collegue is correct. Note that the code assumes that all
platforms use the same type of longs, barring byte order. This isn't
true - e.g. sign-magnitude, 1s-complement, 16-bit chars, 64-bit longs,
etc. It is true on most 32-bit desktop platforms though, they have
8-bit chars, 32-bit longs and use 2s-complement for negative numbers.

Tom

C++ FAQ: http://www.parashift.com/c++-faq-lite/
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
 
Reply With Quote
 
 
 
 
Tom St Denis
Guest
Posts: n/a
 
      01-23-2004

"tom_usenet" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> >char buf[4];
> >long val = 35456;
> >
> >/* serialize ... on for example intel */
> >buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
> >buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
> >buf[2] = (unsigned char) ((val & 0x0000ff00) >> ;
> >buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);
> >
> >/* deserialize ... on for example mac */
> >val = 0;
> >val = val | ((unsigned long) buf[0]) << 24;
> >val = val | ((unsigned long) buf[1]) << 16;
> >val = val | ((unsigned long) buf[2]) << 8;
> >val = val | ((unsigned long) buf[3]) << 0;
> >
> >
> >According to a collegue of mine, the & (in the first part of the code)
> >ensures that the least significant and most significant byte is always
> >intact on whatever platform the buffer is deserialized. I don't agree,

any
> >suggestions ?

>
> Your collegue is correct. Note that the code assumes that all
> platforms use the same type of longs, barring byte order. This isn't
> true - e.g. sign-magnitude, 1s-complement, 16-bit chars, 64-bit longs,
> etc. It is true on most 32-bit desktop platforms though, they have
> 8-bit chars, 32-bit longs and use 2s-complement for negative numbers.


I don't see this as something that can fail [regardless of how the actual
data is stored]. If you have a type which is at least 32-bits then
val&0xFF000000UL is always "defined". All this means is that on platforms
where they store integer types using fluxums and kawalachums instead of bits
they will have to EMULATE!

It's just like platforms with no FPU or support for 32-bit types. They have
to emulate them with stuff they do have.

So yes, you can portably store/load any integer type in an array of unsigned
chars.

Tom


 
Reply With Quote
 
Martijn Lievaart
Guest
Posts: n/a
 
      01-23-2004
On Fri, 23 Jan 2004 13:39:04 +0000, Tom St Denis wrote:

>
> I don't see this as something that can fail [regardless of how the actual
> data is stored]. If you have a type which is at least 32-bits then
> val&0xFF000000UL is always "defined". All this means is that on platforms
> where they store integer types using fluxums and kawalachums instead of bits
> they will have to EMULATE!


No, you are assuming that all computers use the same layout for binary
numbers. That assumption is not true. Computers that use ones-complement
(do these exist in reality any more?) store numbers in a different way
than computers using two complement. If you use this method of
transporting between one- and two-complement machines, it will only work
for positive numbers.

Also, transporting this way when there are more than 32 bits will lose
information. Again, this will not work for nagative numbers, even in the
more common two's complement. And becuase the OP mentioned that this was
about transporting a long, there are machines out there that have 64 bit
long.

> It's just like platforms with no FPU or support for 32-bit types. They have
> to emulate them with stuff they do have.


Not a real comparison. We're talking about systems that have the required
integer types, but happen to store them differently. A better comparison
is to portably store/load floating point types. As the underlying
representations differ from implementation to implementation, this cannot
be done.

> So yes, you can portably store/load any integer type in an array of unsigned
> chars.


No, you can at most portably store/load positive integers. This is
guarenteed by both C and C++ IIRC. The C++ standard has some vague wording
oon the requirements on integer types that boil down to "unsigned integer
types must use normal binary encoding, positive integers stored in signed
integer types must have the same bit pattern as their unsigned
counterpart". I don't have the C standard, but I know it has a slightly
different wording that basically boils down to the same.

Now in practice, all computers nowadays use two's complement, so in
practice this will work
- between machines that use 32-bit longs.
- when your values are positive and have no more than 32 bits (provided
you zeroed out the extra bits beforehand).

HTH
M4

 
Reply With Quote
 
Martijn Lievaart
Guest
Posts: n/a
 
      01-23-2004
On Fri, 23 Jan 2004 12:37:23 +0100, RA Scheltema wrote:

> hi all,
>
>
> A small question about serializing and deserializing a long in a platform
> independent manner. Can this be done with the following code ?:
>
>
> char buf[4];
> long val = 35456;
>
> /* serialize ... on for example intel */
> buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
> buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
> buf[2] = (unsigned char) ((val & 0x0000ff00) >> ;
> buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);
>
> /* deserialize ... on for example mac */
> val = 0;
> val = val | ((unsigned long) buf[0]) << 24;
> val = val | ((unsigned long) buf[1]) << 16;
> val = val | ((unsigned long) buf[2]) << 8;
> val = val | ((unsigned long) buf[3]) << 0;
>
>
> According to a collegue of mine, the & (in the first part of the code)
> ensures that the least significant and most significant byte is always
> intact on whatever platform the buffer is deserialized. I don't agree, any
> suggestions ?


See my other reply in this thread on why whis is a bad idea. It only works
in some situations.

Three other solutions come to mind.

- If your platform has htonl/ntohl (most do), it is an easy way to achieve
the same and much more portably.

- Use integer arithmetic instead of bitwise operations.

- My favorite: transport as text, not binary.

HTH,
M4

 
Reply With Quote
 
Tom St Denis
Guest
Posts: n/a
 
      01-23-2004

"Martijn Lievaart" <(E-Mail Removed)> wrote in message
news(E-Mail Removed) t.rtij.nl...
> On Fri, 23 Jan 2004 13:39:04 +0000, Tom St Denis wrote:
>
> >
> > I don't see this as something that can fail [regardless of how the

actual
> > data is stored]. If you have a type which is at least 32-bits then
> > val&0xFF000000UL is always "defined". All this means is that on

platforms
> > where they store integer types using fluxums and kawalachums instead of

bits
> > they will have to EMULATE!

>
> No, you are assuming that all computers use the same layout for binary
> numbers. That assumption is not true. Computers that use ones-complement
> (do these exist in reality any more?) store numbers in a different way
> than computers using two complement. If you use this method of
> transporting between one- and two-complement machines, it will only work
> for positive numbers.


I don't see that as being valid. "unsigned long" must have at least 32-bits
of precision.

By your logic

unsigned long x, y;

y = 255UL*256UL*256UL*256UL;
x = some_func();
x &= y;
x >>= 24;

Is undefined because x/y may not be a 2s complement?

WRONG. The value of X will lie in 0..255 and will be the bits 23..31 of the
return of some_func(). In reality this "might use walazaums for bits"
comes into play if you memcpy or otherwise directly copy. So on a 1s
complement machine it would have to emulate as appropriate.

For example, ARMv4 processors don't have FPUs. By your logic

float x = 4.0;

is undefined?

> Also, transporting this way when there are more than 32 bits will lose
> information. Again, this will not work for nagative numbers, even in the
> more common two's complement. And becuase the OP mentioned that this was
> about transporting a long, there are machines out there that have 64 bit
> long.


Yeah you have to specify precision. However, many algorithms use fixed
precision (re: block ciphers).

Tom


 
Reply With Quote
 
Martijn Lievaart
Guest
Posts: n/a
 
      01-23-2004
On Fri, 23 Jan 2004 15:01:06 +0000, Tom St Denis wrote:

>> No, you are assuming that all computers use the same layout for binary
>> numbers. That assumption is not true. Computers that use ones-complement
>> (do these exist in reality any more?) store numbers in a different way
>> than computers using two complement. If you use this method of
>> transporting between one- and two-complement machines, it will only work
>> for positive numbers.

>
> I don't see that as being valid. "unsigned long" must have at least 32-bits
> of precision.


Yes.

>
> By your logic
>
> unsigned long x, y;


Hey, where did that unsigned creep in? Maybe you want to reread what I
said.

>
> y = 255UL*256UL*256UL*256UL;
> x = some_func();
> x &= y;
> x >>= 24;
>
> Is undefined because x/y may not be a 2s complement?


I said no such thing.

>
> WRONG. The value of X will lie in 0..255 and will be the bits 23..31 of


I'm not wrong, you are reading wrong. And please loose the caps, it's
annoying.

> the return of some_func(). In reality this "might use walazaums for
> bits" comes into play if you memcpy or otherwise directly copy. So on a
> 1s complement machine it would have to emulate as appropriate.


There is nothing to emulate on a ones complement machine. It can just use
it native types, which happen to have different representations for
negative numbers than the more common twos complement. Completely valid
in both C and C++, no walazaums involved anywhere.

You might want to read up on what happens when converting negative signed
long values to unsigned long, because that is exactily what we are facing
here.

>
> For example, ARMv4 processors don't have FPUs. By your logic
>
> float x = 4.0;
>
> is undefined?


What twist of logic are you trying to achieve here? I'm positively baffled
by your conlusion, I cannot follow you.

>
>> Also, transporting this way when there are more than 32 bits will lose
>> information. Again, this will not work for nagative numbers, even in
>> the more common two's complement. And becuase the OP mentioned that
>> this was about transporting a long, there are machines out there that
>> have 64 bit long.

>
> Yeah you have to specify precision. However, many algorithms use fixed
> precision (re: block ciphers).


Obvious. When transporting between machines you'll always have to specify
the valid ranges.

M4

 
Reply With Quote
 
tom_usenet
Guest
Posts: n/a
 
      01-23-2004
On Fri, 23 Jan 2004 15:01:06 GMT, "Tom St Denis" <(E-Mail Removed)>
wrote:

>
>"Martijn Lievaart" <(E-Mail Removed)> wrote in message
>news(E-Mail Removed) rt.rtij.nl...
>> On Fri, 23 Jan 2004 13:39:04 +0000, Tom St Denis wrote:
>>
>> >
>> > I don't see this as something that can fail [regardless of how the

>actual
>> > data is stored]. If you have a type which is at least 32-bits then
>> > val&0xFF000000UL is always "defined". All this means is that on

>platforms
>> > where they store integer types using fluxums and kawalachums instead of

>bits
>> > they will have to EMULATE!

>>
>> No, you are assuming that all computers use the same layout for binary
>> numbers. That assumption is not true. Computers that use ones-complement
>> (do these exist in reality any more?) store numbers in a different way
>> than computers using two complement. If you use this method of
>> transporting between one- and two-complement machines, it will only work
>> for positive numbers.

>
>I don't see that as being valid. "unsigned long" must have at least 32-bits
>of precision.


He just said it is valid for positive numbers! What has "unsigned
long" got to do with negative numbers?

>
>By your logic
>
>unsigned long x, y;


Where did "unsigned long" come from? The OP was using "long".

>
>y = 255UL*256UL*256UL*256UL;
>x = some_func();
>x &= y;
>x >>= 24;
>
>Is undefined because x/y may not be a 2s complement?


2s complement doesn't apply to unsigned types. It is a convenient way
of representing negative numbers in binary.

Tom

C++ FAQ: http://www.parashift.com/c++-faq-lite/
C FAQ: http://www.eskimo.com/~scs/C-faq/top.html
 
Reply With Quote
 
Dan Pop
Guest
Posts: n/a
 
      01-23-2004
In <40110774$0$329$(E-Mail Removed)4all.nl> "RA Scheltema" <r.a.scheltema[viral][s][p]@[m]dacolian.nl> writes:

>A small question about serializing and deserializing a long in a platform
>independent manner. Can this be done with the following code ?:


It still assumes that longs are 32-bit entities (4 bytes x 8 bits) on
both platforms. There is no easy way of eliminating this assumption,
short of using a textual representation of the value, instead of a binary
one, i.e. serialise with sprintf and deserialise with sscanf and convert
the native strings to and from BCD (to also remove the assumption that
both platforms use the same character set).

>char buf[4];


MUST be unsigned char.

>long val = 35456;


MUST be either an unsigned long or contain a positive value. Otherwise,
see below.

>/* serialize ... on for example intel */
>buf[0] = (unsigned char) ((val & 0xff000000) >> 24);
>buf[1] = (unsigned char) ((val & 0x00ff0000) >> 16);
>buf[2] = (unsigned char) ((val & 0x0000ff00) >> ;
>buf[3] = (unsigned char) ((val & 0x000000ff) >> 0);


All the casts to unsigned char are superfluous.

>/* deserialize ... on for example mac */
>val = 0;
>val = val | ((unsigned long) buf[0]) << 24;


If the original value was negative, additional assumptions are needed:
both platforms use the same representation for negative values and the
conversion of an unsigned long value that cannot be represented by a long
preserves the bit pattern. Both assumptions are reasonable, but neither
is guaranteed by the language.

>val = val | ((unsigned long) buf[1]) << 16;
>val = val | ((unsigned long) buf[2]) << 8;
>val = val | ((unsigned long) buf[3]) << 0;
>
>According to a collegue of mine, the & (in the first part of the code)
>ensures that the least significant and most significant byte is always
>intact on whatever platform the buffer is deserialized. I don't agree, any
>suggestions ?


He is perfectly right. Because you're operating on the full
representation of the value, you can be sure that buf[0] will contain
the most significant byte of the value, regardless of the byte order.
And because the value is reconstructed using arithmetic operations,
you can also be sure that the result is correct, again regardless of the
byte order. But getting the byte order right is not enough if you need
to deal with negative values, too.

The proper handling of negative values without the additional assumptions
mentioned above is easy if the implementation also supports long long's
or some other form of integer that provides more than 32 bits. The
first step requires assigning val to uval, an unsigned long variable.
The result is independent of the way nagative values are represented.
Serialise and deserialise uval.

typedef long long big_t;

if ((uval & 0x80000000) != 0)
val = (big_t)uval - (big_t)ULONG_MAX - 1;
else
val = uval;

As you can see, doing the job right even in not a 100% platform
independent way is more complex than just taking care of the byte order.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
Sean Kelly
Guest
Posts: n/a
 
      01-23-2004
You might also want to look at the socket calls htonl() and ntohl().


Sean
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Having compilation error: no match for call to (const __gnu_cxx::hash<long long int>) (const long long int&) veryhotsausage C++ 1 07-04-2008 05:41 PM
Platform Independent Forms/Programs with ASP.NET ibeetb ASP .Net 1 06-01-2004 05:29 PM
platform independent serialization of a long RA Scheltema C Programming 10 01-24-2004 02:02 PM
Re: truly abstract (platform independent) pathnames Harald Hein Java 9 08-17-2003 01:01 PM
Platform-independent way to refer to execute path MK Python 1 06-25-2003 05:43 PM



Advertisments