Velocity Reviews > Cross-platform way to pack (int + flags) to unsigned int

Cross-platform way to pack (int + flags) to unsigned int

Alex J
Guest
Posts: n/a

 06-10-2013
Hi all,

Given two values: int X and int F and assuming that
(X << 2) >> 2 == X and F is a two-bit value write cross-platform code to "pack" X and F to unsigned int.

I've solved this as follows.
Packing:
unsigned int P = (unsigned int) ((X << 2) | F);

Unpacking:
int F = ((int) P) & 3;
int A = ((int) P) >> 2;

I know that both packing and unpacking code is not valid from ISO C point of view but still the question is - should I care about it?

The only compiler I care about is GCC ver.>4 and its targets, e.g. windows, linux, mac os x.

How this problem can be solved in a truly cross-platform way? Especially assuming the packed data will be read on the multiple platforms and sizeof(int) == sizeof(unsigned int) == 32 on all of these platforms.

P.S.:
#include <stdio.h>
#include <stdlib.h>

void check_int(int a, int f) {
int a1;
int f1;
unsigned int p = (unsigned int) (a << 2) | f;

a1 = ((int) p) >> 2;
f1 = ((int) p) & 3;

if ((a1 != a) || (f1 != f)) {
fprintf(stderr, "a1(%d) != a(%d) || f1(%d) != f(%d)\n", a1, a, f1, f);
exit(-1);
} else {
fprintf(stdout, "OK: %d, %d\n", a, f);
}
}

int main() {
check_int(1, 0);
check_int(144555666, 3);
check_int(-1, 2);
check_int(-222333444, 1);
return 0;
}

Paul N
Guest
Posts: n/a

 06-10-2013
On Jun 10, 8:27*pm, Alex J <(E-Mail Removed)> wrote:
> Hi all,
>
> Given two values: int X and int F and assuming that
> (X << 2) >> 2 == X and F is a two-bit value write cross-platform codeto "pack" X and F to unsigned int.
>
> I've solved this as follows.
> Packing:
> unsigned int P = (unsigned int) ((X << 2) | F);
>
> Unpacking:
> int F = ((int) P) & 3;
> int A = ((int) P) >> 2;
>
> I know that both packing and unpacking code is not valid from ISO C pointof view but still the question is - should I care about it?

I'm not an expert, but I thought that messing around with unsigned
values is perfectly safe. So I would be inclined to write:

Packing:
unsigned int P = (( (unsigned int) X << 2) | (unsigned int) F );

Unpacking:
int F = (int) (P & 3);
int A = (int) (P >> 2);

But there are experts in this group who can give a more authoritative

Barry Schwarz
Guest
Posts: n/a

 06-10-2013
On Mon, 10 Jun 2013 12:27:43 -0700 (PDT), Alex J <(E-Mail Removed)>
wrote:

>Hi all,
>
>Given two values: int X and int F and assuming that
>(X << 2) >> 2 == X and F is a two-bit value write cross-platform code to "pack" X and F to unsigned int.
>
>I've solved this as follows.
>Packing:
>unsigned int P = (unsigned int) ((X << 2) | F);
>
>Unpacking:
>int F = ((int) P) & 3;
>int A = ((int) P) >> 2;
>
>I know that both packing and unpacking code is not valid from ISO C point of view but still the question is - should I care about it?

In what way do you think it is invalid?

>
>The only compiler I care about is GCC ver.>4 and its targets, e.g. windows, linux, mac os x.
>
>How this problem can be solved in a truly cross-platform way? Especially assuming the packed data will be read on the multiple platforms and sizeof(int) == sizeof(unsigned int) == 32 on all of these platforms.

How is the data being transferred between platforms, as binary or
text? If text, you should not have a problem.

If binary, will all the system always have the same endianness? If
one is big-endian, then f will be stored in the last two bits of the
fourth byte. When read on a little endian system, it will look for f
in the last two bits of the first byte.

--
Remove del for email

Lew Pitcher
Guest
Posts: n/a

 06-10-2013
On Monday 10 June 2013 16:59, in comp.lang.c, http://www.velocityreviews.com/forums/(E-Mail Removed) wrote:

> On Mon, 10 Jun 2013 12:27:43 -0700 (PDT), Alex J <(E-Mail Removed)>
> wrote:
>
>>Hi all,
>>
>>Given two values: int X and int F and assuming that
>>(X << 2) >> 2 == X and F is a two-bit value write cross-platform code to
>>"pack" X and F to unsigned int.
>>
>>I've solved this as follows.
>>Packing:
>>unsigned int P = (unsigned int) ((X << 2) | F);
>>
>>Unpacking:
>>int F = ((int) P) & 3;
>>int A = ((int) P) >> 2;
>>
>>I know that both packing and unpacking code is not valid from ISO C point
>>of view but still the question is - should I care about it?

>
> In what way do you think it is invalid?
>
>>
>>The only compiler I care about is GCC ver.>4 and its targets, e.g.
>>windows, linux, mac os x.
>>
>>How this problem can be solved in a truly cross-platform way? Especially
>>assuming the packed data will be read on the multiple platforms and
>>sizeof(int) == sizeof(unsigned int) == 32 on all of these platforms.

>
> How is the data being transferred between platforms, as binary or
> text? If text, you should not have a problem.

Caveat: Text shouldn't be a problem IF both writer and reader agree on the
characterset of the data, the native line termination format of the data,
the block data format of the data (i.e., for IBM mainframe, is the data
Fixed Blocked, Variable Blocked, Variable Blocked Spanned, Undefined or
something else), etc.

> If binary, will all the system always have the same endianness? If
> one is big-endian, then f will be stored in the last two bits of the
> fourth byte. When read on a little endian system, it will look for f
> in the last two bits of the first byte.

In other words, a common data exchange format must be decided upon and
agreed to by all writers and readers /before/ the programs are developed,
coded, and tested.

--
Lew Pitcher
"In Skills, We Trust"

Eric Sosman
Guest
Posts: n/a

 06-10-2013
On 6/10/2013 3:27 PM, Alex J wrote:
> Hi all,
>
> Given two values: int X and int F and assuming that
> (X << 2) >> 2 == X and F is a two-bit value write cross-platform code to "pack" X and F to unsigned int.

Let's pause a moment to study the X condition more closely.
If it's meant as a portable ("cross-platform") statement, it
implies 0 <= X && X <= INT_MAX/4 (otherwise, simply evaluating
X<<2 would yield undefined behavior). On the 32-bit systems you
mention below, this means 0 <= X && X <= 0x1FFFFFFF. Keep the
range restriction in mind for what follows.

> I've solved this as follows.
> Packing:
> unsigned int P = (unsigned int) ((X << 2) | F);

Okay; the cast is unnecessary but harmless.

> Unpacking:
> int F = ((int) P) & 3;
> int A = ((int) P) >> 2;

Okay; again, the casts are unnecessary but harmless. (But
see below for a lame excuse that semi-justifies the second one.)

> I know that both packing and unpacking code is not valid from ISO C point of view but still the question is - should I care about it?

As long as the range restrictions hold, there's nothing invalid

> The only compiler I care about is GCC ver.>4 and its targets, e.g. windows, linux, mac os x.
>
> How this problem can be solved in a truly cross-platform way? Especially assuming the packed data will be read on the multiple platforms and sizeof(int) == sizeof(unsigned int) == 32 on all of these platforms.

Depends what you mean by "truly cross-platform." The assumption
of a 32-bit int excludes a few platforms right away, and there may be
a few more where gcc isn't available. Still, a large majority of

There's another problem lurking, though: Endianness. If the 32-bit
int is composed of four 8-bit bytes, there are 4! = 24 ways to arrange
those bytes;[*] two arrangements ("Big-Endian" and "Little-Endian") are
popular today, and at least one more ("Middle-Endian") has been seen
in the past. If X and F are one million and one, respectively, you'll
pack them as the value 4000001, forming the four bytes 00 3D 09 01 (in
hex). A Big-Endian machine would transmit or store these with the 00
first and the 01 last, but a Little-Endian machine would do things the
other way around. So if one of them writes the value (in its native
order) and the other reads it (in *its* native order), 4000001 will
be misinterpreted as 17382656, from which you'll extract X = 4345664
and F = 0. The fidelity leaves a little to be desired!

That's not to say these problems can't be dealt with, just that
you'd better give them some thought. See the FAQ.
[*] Actually, there are 32! ~= 2.6E35 ways to arrange the bits.
Consider yourself fortunate that nobody's quite that perverse. Yet.

> P.S.:
> #include <stdio.h>
> #include <stdlib.h>
>
> void check_int(int a, int f) {
> int a1;
> int f1;
> unsigned int p = (unsigned int) (a << 2) | f;
>
> a1 = ((int) p) >> 2;
> f1 = ((int) p) & 3;
>
> if ((a1 != a) || (f1 != f)) {
> fprintf(stderr, "a1(%d) != a(%d) || f1(%d) != f(%d)\n", a1, a, f1, f);
> exit(-1);

Don't Do That. Use EXIT_FAILURE. (Does anybody *know* where
this exit(-1) meme got started? Does anybody know of *any* system
on which a -1 exit status survives unchanged all the way to the point
where an invoker could examine it? I think it *might* have worked
on VMS -- but it would have meant "success" if it did.)

> } else {
> fprintf(stdout, "OK: %d, %d\n", a, f);
> }
> }
>
> int main() {
> check_int(1, 0);
> check_int(144555666, 3);
> check_int(-1, 2);
> check_int(-222333444, 1);

The final two tests are on shaky ground, as they violate the
range restrictions mentioned earlier. (On the other hand, they
also -- sort of -- justify some of the casts you've written: If
X<<2 with X negative doesn't explode *and* (int)p with p outside
the range of int doesn't explode *and* (int)p>>2 with (int)p
negative doesn't explode, then the cast *might* save the day.
But I wouldn't call that "truly cross-platform.")

> return 0;
> }
>

Another approach might be to use a struct with bit-fields:

struct packed {
int X : 30;
unsigned int F : 2;
};

This doesn't solve the representation issues -- if anything, it
makes them trickier -- but it relaxes the range restriction to
permit negative X'es.

--
Eric Sosman
(E-Mail Removed)d

Alex J
Guest
Posts: n/a

 06-11-2013
On Tuesday, June 11, 2013 1:33:21 AM UTC+4, Eric Sosman wrote:
> On 6/10/2013 3:27 PM, Alex J wrote:
>
> snip...
> Let's pause a moment to study the X condition more closely.
>
> If it's meant as a portable ("cross-platform") statement, it
>
> implies 0 <= X && X <= INT_MAX/4 (otherwise, simply evaluating
>
> X<<2 would yield undefined behavior). On the 32-bit systems you
>
> mention below, this means 0 <= X && X <= 0x1FFFFFFF. Keep the
>
> range restriction in mind for what follows.

Yes, you're absolutely right. But I need signed types too.

>
>
>
> > I've solved this as follows.

>
> > Packing:

>
> > unsigned int P = (unsigned int) ((X << 2) | F);

>
>
>
> Okay; the cast is unnecessary but harmless.
>
>
>
> > Unpacking:

>
> > int F = ((int) P) & 3;

>
> > int A = ((int) P) >> 2;

>
>
>
> Okay; again, the casts are unnecessary but harmless. (But
>
> see below for a lame excuse that semi-justifies the second one.)
>
>
>
> > I know that both packing and unpacking code is not valid from ISO C point of view but still the question is - should I care about it?

>
>
>
> As long as the range restrictions hold, there's nothing invalid
>

ISO C99 (6.5.7/4) - undefined behavior for left shift for negative value.

>
>
>
> > The only compiler I care about is GCC ver.>4 and its targets, e.g. windows, linux, mac os x.

>
> >

>
> > How this problem can be solved in a truly cross-platform way? Especially assuming the packed data will be read on the multiple platforms and sizeof(int) == sizeof(unsigned int) == 32 on all of these platforms.

>
>
>
> Depends what you mean by "truly cross-platform." The assumption
>
> of a 32-bit int excludes a few platforms right away, and there may be
>
> a few more where gcc isn't available. Still, a large majority of
>
> "mainstream" platforms meet your requirements.
>
>
>
> There's another problem lurking, though: Endianness.

Yes, you're right. I am aware of it and I planned to "document" low-endian representation of the transmitted binary data (as in x86).

>
>
>
> That's not to say these problems can't be dealt with, just that
>
> you'd better give them some thought. See the FAQ.
>
>
>
>[*] Actually, there are 32! ~= 2.6E35 ways to arrange the bits.
>
> Consider yourself fortunate that nobody's quite that perverse. Yet.
>
>
>
> > P.S.:

>
> > #include <stdio.h>

>
> > #include <stdlib.h>

>
> >

>
> > void check_int(int a, int f) {

>
> > int a1;

>
> > int f1;

>
> > unsigned int p = (unsigned int) (a << 2) | f;

>
> >

>
> > a1 = ((int) p) >> 2;

>
> > f1 = ((int) p) & 3;

>
> >

>
> > if ((a1 != a) || (f1 != f)) {

>
> > fprintf(stderr, "a1(%d) != a(%d) || f1(%d) != f(%d)\n", a1, a, f1, f);

>
> > exit(-1);

>
>
>
> Don't Do That. Use EXIT_FAILURE. (Does anybody *know* where
>
> this exit(-1) meme got started? Does anybody know of *any* system
>
> on which a -1 exit status survives unchanged all the way to the point
>
> where an invoker could examine it? I think it *might* have worked
>
> on VMS -- but it would have meant "success" if it did.)

Thanks for pointing on that.

>
>
>
> > } else {

>
> > fprintf(stdout, "OK: %d, %d\n", a, f);

>
> > }

>
> > }

>
> >

>
> > int main() {

>
> > check_int(1, 0);

>
> > check_int(144555666, 3);

>
> > check_int(-1, 2);

>
> > check_int(-222333444, 1);

>
>
>
> The final two tests are on shaky ground, as they violate the
>
> range restrictions mentioned earlier. (On the other hand, they
>
> also -- sort of -- justify some of the casts you've written: If
>
> X<<2 with X negative doesn't explode *and* (int)p with p outside
>
> the range of int doesn't explode *and* (int)p>>2 with (int)p
>
> negative doesn't explode, then the cast *might* save the day.
>
> But I wouldn't call that "truly cross-platform.")

Yes, you're right. But I need signed integers.
May be there is a reliable way to transform to/from a packed binary number representation - i.e. flags + number (e.g. network byte order)?
After quick googling I did not find any and now I believe I shouldn't do it..

I need a quick loading and saving the big packs of binary data on the same platform (writing a big array of the unsigned ints) so I believe I should provide a special converter for big endian platforms. Convertation will be the rare though theoretically possible case so I do not care about its speedand memory consumption.

Is it sufficient to have two converters: one for little->big endian format converter and big->little endian format converter?

AFAIK all the known 32-bit platform (well, better to say platforms with 32-bit ints) with same endianess share the *same* binary representation of ints and all the bitwise operations on integer numbers has the same effect? Oh, I forgot to mention that at the moment I care of GCC only but support forthe other modern compilers - msvc, icc would be nice.

If it is true at least I can rely on the packing/unpacking operations I specified above for both big and little endian platforms and write converters that aware about endianess. Of course endianess information will be encodedin the header of the transmitted binary representation.

>
>
>
> > return 0;

>
> > }

>
> >

>
>
>
> Another approach might be to use a struct with bit-fields:
>
>
>
> struct packed {
>
> int X : 30;
>
> unsigned int F : 2;
>
> };
>
>
>
> This doesn't solve the representation issues -- if anything, it
>
> makes them trickier -- but it relaxes the range restriction to
>
> permit negative X'es.
>

Thank you and all who answered.

>
>
> --
>
> Eric Sosman
>

Alex J
Guest
Posts: n/a

 06-11-2013
On Tuesday, June 11, 2013 1:33:21 AM UTC+4, Eric Sosman wrote:
> [snip]
> Another approach might be to use a struct with bit-fields:
>
> struct packed {
> int X : 30;
> unsigned int F : 2;
> };
>
> This doesn't solve the representation issues -- if anything, it
> makes them trickier -- but it relaxes the range restriction to
> permit negative X'es.

I heard that bit fields are non-portable and there is no guarantee that compiler will not apply some alignment to the struct that's why I didn't use it.

I am probably wrong but even with pragma pack(1) struct is not guaranteed to be 32-bit size or simply said sizeof(struct packed) will not always be 4. Yet I'm not sure on that.

Please correct me if I'm wrong.

>
> --
> Eric Sosman

James Kuyper
Guest
Posts: n/a

 06-11-2013
On 06/11/2013 06:24 AM, Alex J wrote:
> On Tuesday, June 11, 2013 1:33:21 AM UTC+4, Eric Sosman wrote:
>> [snip]
>> Another approach might be to use a struct with bit-fields:
>>
>> struct packed {
>> int X : 30;
>> unsigned int F : 2;
>> };
>>
>> This doesn't solve the representation issues -- if anything, it
>> makes them trickier -- but it relaxes the range restriction to
>> permit negative X'es.

>
> I heard that bit fields are non-portable and there is no guarantee that compiler will not apply some alignment to the struct that's why I didn't use it.

That's what he meant when he said "it makes them trickier".

> I am probably wrong but even with pragma pack(1) struct is not guaranteed to be 32-bit size or simply said sizeof(struct packed) will not always be 4. Yet I'm not sure on that.

#pragma pack itself is not standard, so the standard guarantees nothing
about how it works on those implementations which support it - and the
ones that do support it do so with several different incompatible
syntaxes for specifying the way the structures are packed.

To avoid undefined behavior during packing, you'll have to transform
valid values for X into positive numbers, convert to unsigned, and then
performing the left shift. For unpacking, you need to perform the
inverse operations in the opposite order. There's several different
ways to make the numbers positive. One of the simplest is:

#define INT30_MIN (-1<<29)

// Packing
p = (unsigned)(x-INT30_MIN) << 2 | f

// Unpacking
f = p & 3;
x = (int)(p>>2)+INT30_MIN;

The code would have to be a bit more complicated if you want it to work
on systems where int and unsigned int are not both 32 bit types. You'll
still have to deal with byte ordering when reading or writing the
packed values.
--
James Kuyper

Eric Sosman
Guest
Posts: n/a

 06-11-2013
On 6/11/2013 4:35 AM, Alex J wrote:
> On Tuesday, June 11, 2013 1:33:21 AM UTC+4, Eric Sosman wrote:
>> [...]
>> As long as the range restrictions hold, there's nothing invalid

>
> ISO C99 (6.5.7/4) - undefined behavior for left shift for negative value.

You began with

Given two values: int X and int F and assuming that
(X << 2) >> 2 == X and F is a two-bit value

.... which means either that X is non-negative and not too large,
or that you're *not* worried about 6.5.7p4! If 6.5.7.4 is in

> [...]
> May be there is a reliable way to transform to/from a packed binary number representation - i.e. flags + number (e.g. network byte order)?
> After quick googling I did not find any and now I believe I shouldn't do it.

One fully-portable approach would be to add a suitable offset
to X before encoding, ensuring that what's shifted is non-negative:

unsigned int encoded = (X + OFFSET) << 2 | F;

Then you subtract the same offset when extracting:

int decodedX = (encoded >> 2) - OFFSET;
int decodedF = encoded & 3;

> I need a quick loading and saving the big packs of binary data on the same platform (writing a big array of the unsigned ints) so I believe I should provide a special converter for big endian platforms. Convertation will be the rare though theoretically possible case so I do not care about its speed and memory consumption.

You've confused me. When you say "on the same platform," it seems
that you want code that will work everywhere, but that packing and
extracting all happen on the same system; in this case, endianness is
not an issue. But when you talk about a "converter for big endian
platforms," it seems that data exchange between variegated platforms
is in fact needed ...

Either way, it's easy to read and write the data in a consistent
"wire format" regardless of the host platform's endianness. Here's
how you could write a four-byte value in Little-Endian order:

// Error-checking omitted for brevity
unsigned int value = ...;
putc(value & 0xFF, stream);
putc((value >> & 0xFF, stream);
putc((value >> 16) & 0xFF, stream);
putc((value >> 24) & 0xFF, stream);

If you're certain of 32-bitness you could omit the final &0xFF, but
any speedup would surely be negligible compared to the I/O. Then
you can read the bytes back the same way:

unsigned int b0 = getc(stream);
unsigned int b1 = getc(stream);
unsigned int b2 = getc(stream);
unsigned int b3 = getc(stream);
unsigned int value = (b3 << 24) + (b2 << 16) + (b1 << + b0;

.... or even

int X = (b3 << 22) + (b2 << 14) + (b1 << 6) + (b0 >> 2)
- OFFSET;
int F = b0 & 3;

> Is it sufficient to have two converters: one for little->big endian format converter and big->little endian format converter?

As illustrated above, I think it suffices to have zero converters.

> AFAIK all the known 32-bit platform (well, better to say platforms with 32-bit ints) with same endianess share the *same* binary representation of ints and all the bitwise operations on integer numbers has the same effect? Oh, I forgot to mention that at the moment I care of GCC only but support for the other modern compilers - msvc, icc would be nice.

This sounds like a digression; I'm not sure what you're driving at.
Negative numbers, maybe? You need to avoid them anyhow, because even if
all the platforms use two's complement (it's been years since I saw one
that didn't) you still need to worry about getting the sign right when
extracting. Right-shifting a negative int is formally undefined; in
practice, some platforms duplicate the sign bit while others introduce
zeros (giving a non-negative result).

> If it is true at least I can rely on the packing/unpacking operations I specified above for both big and little endian platforms and write converters that aware about endianess. Of course endianess information will be encoded in the header of the transmitted binary representation.

Or just read and write the same "wire format" everywhere.

--
Eric Sosman
(E-Mail Removed)d

Eric Sosman
Guest
Posts: n/a

 06-11-2013
On 6/11/2013 6:24 AM, Alex J wrote:
> On Tuesday, June 11, 2013 1:33:21 AM UTC+4, Eric Sosman wrote:
>> [snip]
>> Another approach might be to use a struct with bit-fields:
>>
>> struct packed {
>> int X : 30;
>> unsigned int F : 2;
>> };
>>
>> This doesn't solve the representation issues -- if anything, it
>> makes them trickier -- but it relaxes the range restriction to
>> permit negative X'es.

>
> I heard that bit fields are non-portable and there is no guarantee that compiler will not apply some alignment to the struct that's why I didn't use it.

Like much of C, bit-fields are portable within limits. Every C
compiler supports bit-fields, with widths up to at least the width
of an int -- Since you're assuming 32-bit ints, the :30 bit-field is
fine. The compiler has a lot of freedom in how it chooses to store
the bits (which is why I said the representation issues get trickier),
but if you're only worried about intra-machine storage that's not a
problem.

> I am probably wrong but even with pragma pack(1) struct is not guaranteed to be 32-bit size or simply said sizeof(struct packed) will not always be 4. Yet I'm not sure on that.

Correct: As I said, the compiler has a lot of freedom. As for
#pragma pack(1) -- Well, once you've uttered a non-Standard #pragma,
*nothing* is guaranteed by the C language.

--
Eric Sosman
(E-Mail Removed)d