Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Storing 4 characters using 3 bytes

Reply
Thread Tools

Storing 4 characters using 3 bytes

 
 
Chris Torek
Guest
Posts: n/a
 
      10-15-2003
In article <bmkdmb$n8b4a$(E-Mail Removed)-berlin.de>
cody <(E-Mail Removed)> writes:
>typedef union
>{
> char chars[3];
> struct
> {
> int _1 : 6;
> int _2 : 6;
> int _3 : 6;
> int _4 : 6;
> } BITS
>
>} PACKED


Even once the syntax errors are fixed, the code simply does not
work at all on the SPARC. Try it and see.

(You will find that c[0] is never changed. This is a hint as to
what is going wrong. )
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://67.40.109.61/torek/index.html (for the moment)
Reading email is like searching for food in the garbage, thanks to spammers.
 
Reply With Quote
 
 
 
 
Irrwahn Grausewitz
Guest
Posts: n/a
 
      10-15-2003
"cody" <(E-Mail Removed)> wrote:

>typedef union
>{
> char chars[3];
> struct
> {
> int _1 : 6;
> int _2 : 6;
> int _3 : 6;
> int _4 : 6;
> } BITS
>
>} PACKED


If you'd followed the thread "How to extract bytes from a long?"
you'd already know this is calling for nasal demons.

Regards
--
Irrwahn
((E-Mail Removed))
 
Reply With Quote
 
 
 
 
cody
Guest
Posts: n/a
 
      10-15-2003
> >typedef union
> >{
> > char chars[3];
> > struct
> > {
> > int _1 : 6;
> > int _2 : 6;
> > int _3 : 6;
> > int _4 : 6;
> > } BITS
> >
> >} PACKED

>
> Even once the syntax errors are fixed, the code simply does not
> work at all on the SPARC. Try it and see.


Missing semicolons? Sorry.

> (You will find that c[0] is never changed. This is a hint as to
> what is going wrong. )


Why? Structure padding? I don't think the struct BITS will be padding bits
inserted. added at the end maybe but that is not of interest here.

--
cody

[Freeware, Games and Humor]
www.deutronium.de.vu || www.deutronium.tk


 
Reply With Quote
 
CBFalconer
Guest
Posts: n/a
 
      10-16-2003
becte wrote:
>
> I need to use three bytes to store four 6-bit integers (4 * 6 = 3 *
> like this
> 11111122|22223333|33444444
> Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
> and the output is int o1,o2,o3, range 0 .. 2^8-1
> How to do this in a clever way?
>
> (The 6 bits integers represent characters in range A-Z and 0-9)


You can do better. You can store three of those chars into 16
bits. You should include at least a blank, for 37 chars each
represented by a value in 0 through 39

You store it by: i = 40 * (40 * c3 + c2) + c1. You extract by
dividing by 40 and taking the remainder.

This is a very old mechanism, and is at the root of 6 char
filenames in bygone years. Look up Rad50 (the 50 is in octal).

--
Chuck F ((E-Mail Removed)) ((E-Mail Removed))
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!


 
Reply With Quote
 
Chris Torek
Guest
Posts: n/a
 
      10-16-2003
[regarding a union of "char" array and bitfields, I wrote:]
>> Even once the syntax errors are fixed, the code simply does not
>> work at all on the SPARC. Try it and see.


Actually, I picked the wrong machine; I now think it *would* work
(depending on desired output) on the SPARC. Ones on which it would
likely fail badly are certain Motorola 680x0 implementations.

In article <bmkjf3$i4f7p$(E-Mail Removed)-berlin.de>
cody <(E-Mail Removed)> writes:
>Missing semicolons? Sorry.


That, and the names "_1" through "_4" are dubious. (I would have
to consult the C standards to see whether those names are reserved
to users in the structure-member namespace. Instead of using names
whose form requires careful scrutiny of a C standard, why not use
names that are clearly off-limits to the implementor?)

>Why? Structure padding? I don't think the struct BITS will be padding bits
>inserted. added at the end maybe but that is not of interest here.


For discussion purposes, here is the proposed code, turned into
compilable C:

void pack(unsigned char *out, int in1, int in2, int in3, int in4) {
union {
unsigned char c[3];
struct {
unsigned int one:6, two:6, three:6, four:6;
} bits;
} u;
u.bits.one = in1;
u.bits.two = in2;
u.bits.three = in3;
u.bits.four = in4;

out[0] = u.c[0];
out[1] = u.c[1];
out[2] = u.c[2];
}

Now, suppose we have a fairly typical (in today's terms) machine
with 8-bit "char"s and 4-byte "int"s and 4-byte "storage units".
If you print out the size of the union, you will find that it
requires four bytes, not three.

These four bytes are laid out in memory in the following way,
as required by the C standards:

a) when viewed as array-of-char:
at offset 0: u.c[0] (one byte)
at offset 1: u.c[1] (one byte)
at offset 2: u.c[2] (one byte)
at offset 3: <unnamed> (one byte of padding)
(remember that arrays *must* be contiguous)

b) when viewed as unnamed structure with bitfields:
storage unit at offset 0: <four bytes wide, unspecified layout>

Part (b) is where the problem occurs. The layout of the bitfields
within the storage unit is unspecified. Suppose the compiler
chooses to start at "bit 0" of a 32-bit big-endian word. In this
case, u.bits.one occupies bits 0 through 5 inclusive of that 32-bit
word, which live in the unnamed padding byte at offset 3 in the
array in u.c. u.bits.two occupies bits 6 through 11, which uses
two more bits in the unnamed byte and then four bits in the array
element u.c[3]. (At least one Motorola 68020 C compiler handles
-- or handled -- bitfields this way; I used it in the early Sun
days.)

As it happens, SPARC compilers in big-endian mode (which is to say
most of them) are "encouraged" to allocate bitfields starting at
bit 31 and working down towards bit 0. This means that u.bits.one
occupies bits 31 through 26 inclusive, which live in the array
element u.c[0].

On Intel x86 compilers, the situation is reversed: the usual
convention is to allocate bitfields starting at bit 0 and working
up towards bit 31. As it happens, Intel x86 CPUs are little-endian
-- so this means that u.bits.one occupies bits 0 through 5 inclusive,
which are also bits 0 through 5 of u.c[0]. u.bits.two occupies
bits 6 through 11, which are in bits 6 and 7 of u.c[0] (for the
two low-order bits of u.bits.two) and then bits 0 through 3 of
u.c[1].

Of course, there is nothing but convention to prevent an x86 C
compiler from allocating bits from 31 down to 0, in which case
u.c[0] would not map to any of the bits in u.bits. In any case,
the output on the x86 seems a bit peculiar -- the bits are spread
over the bytes in an interesting manner:

#include <stdio.h>

int main(void) {
unsigned char buf[3];
pack(buf, 0x00, 0x3f, 0x00, 0x00);
printf("result: buf[0..3] = 0x%2.2x%2.2x%2.2x\n",
buf[0], buf[1], buf[2]);
return 0;
}

produces:

result: buf[0..3] = 0xc00f00

Packing the tuple <0x00, 0x03, 0x00, 0x00> results in 0xc00000, so
we see that, indeed, buf[0] contains as its two *uppermost* bits
the two *lowermost* bits of the 2nd value.

(The result on the big-endian SPARC is probably closer to what was
expected.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://67.40.109.61/torek/index.html (for the moment)
Reading email is like searching for food in the garbage, thanks to spammers.
 
Reply With Quote
 
Dan Pop
Guest
Posts: n/a
 
      10-16-2003
In <bmkjf3$i4f7p$(E-Mail Removed)-berlin.de> "cody" <(E-Mail Removed)> writes:

>> >typedef union
>> >{
>> > char chars[3];
>> > struct
>> > {
>> > int _1 : 6;
>> > int _2 : 6;
>> > int _3 : 6;
>> > int _4 : 6;
>> > } BITS
>> >
>> >} PACKED

>>
>> Even once the syntax errors are fixed, the code simply does not
>> work at all on the SPARC. Try it and see.

>
>Missing semicolons? Sorry.
>
>> (You will find that c[0] is never changed. This is a hint as to
>> what is going wrong. )

>
>Why? Structure padding? I don't think the struct BITS will be padding bits
>inserted. added at the end maybe but that is not of interest here.


Clue: in what order do bit fields get allocated inside the storage unit?
What if this storage unit happens to be a 32-bit entity?

10 An implementation may allocate any addressable storage unit large
enough to hold a bit-field. If enough space remains,
a bit-field that immediately follows another bit-field in
a structure shall be packed into adjacent bits of the same
unit. If insufficient space remains, whether a bit-field that
does not fit is put into the next unit or overlaps adjacent
units is implementation-defined. The order of allocation of
bit-fields within a unit (high-order to low-order or low-order
to high-order) is implementation-defined. The alignment of the
addressable storage unit is unspecified.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: http://www.velocityreviews.com/forums/(E-Mail Removed)
 
Reply With Quote
 
CBFalconer
Guest
Posts: n/a
 
      10-16-2003
Chris Torek wrote:
>
> [regarding a union of "char" array and bitfields, I wrote:]
> >> Even once the syntax errors are fixed, the code simply does
> >> not work at all on the SPARC. Try it and see.

>

.... snip ...
>
> For discussion purposes, here is the proposed code, turned into
> compilable C:
>
> void pack(unsigned char *out, int in1, int in2, int in3, int in4) {
> union {
> unsigned char c[3];
> struct {
> unsigned int one:6, two:6, three:6, four:6;
> } bits;
> } u;
> u.bits.one = in1;
> u.bits.two = in2;
> u.bits.three = in3;
> u.bits.four = in4;
>
> out[0] = u.c[0];
> out[1] = u.c[1];
> out[2] = u.c[2];
> }

... snip about problems etc. ...

However, for this problem, simply work with values using
multiplication and division and all the problems and
non-portabilities go away. See my post elsethread.

--
Chuck F ((E-Mail Removed)) ((E-Mail Removed))
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE worldnet address!

 
Reply With Quote
 
becte
Guest
Posts: n/a
 
      10-16-2003
Eric Sosman <(E-Mail Removed)> wrote in message news:<(E-Mail Removed)>...
> becte wrote:
> >
> > I need to use three bytes to store four 6-bit integers (4 * 6 = 3 *
> > like this
> > 11111122|22223333|33444444
> > Suppose the input is, int c1, c2, c3, c4, range 0 .. 2^6 -1
> > and the output is int o1,o2,o3, range 0 .. 2^8-1
> > How to do this in a clever way?

>
> "Straightforward" suffices:
>
> o1 = (c1 << 2) | (c2 >> 4);
> o2 = ((c2 & 0xF) << 4) | (c3 >> 2);
> 03 = ((c3 & 0x3) << 2) | c4;
>


I go for the straightforward solution.
BTW, there is an error in the last line, should be
o3 = ((c3 & 0x3) << 6) | c4; // shift 6, not 2
 
Reply With Quote
 
cody
Guest
Posts: n/a
 
      10-16-2003
> Actually, I picked the wrong machine; I now think it *would* work
> (depending on desired output) on the SPARC. Ones on which it would
> likely fail badly are certain Motorola 680x0 implementations.


Why? byte order doesn't matter here.You put 3 bytes in and get 4 bytes out.

> That, and the names "_1" through "_4" are dubious. (I would have
> to consult the C standards to see whether those names are reserved
> to users in the structure-member namespace. Instead of using names
> whose form requires careful scrutiny of a C standard, why not use
> names that are clearly off-limits to the implementor?)


OK, I admit that _1 is a stupid identifier.

> >Why? Structure padding? I don't think the struct BITS will be padding

bits
> >inserted. added at the end maybe but that is not of interest here.

>
> For discussion purposes, here is the proposed code, turned into
> compilable C:
>
> void pack(unsigned char *out, int in1, int in2, int in3, int in4) {
> union {
> unsigned char c[3];
> struct {
> unsigned int one:6, two:6, three:6, four:6;
> } bits;
> } u;
> u.bits.one = in1;
> u.bits.two = in2;
> u.bits.three = in3;
> u.bits.four = in4;
>
> out[0] = u.c[0];
> out[1] = u.c[1];
> out[2] = u.c[2];
> }
>
> Now, suppose we have a fairly typical (in today's terms) machine
> with 8-bit "char"s and 4-byte "int"s and 4-byte "storage units".
> If you print out the size of the union, you will find that it
> requires four bytes, not three.
>
> These four bytes are laid out in memory in the following way,
> as required by the C standards:
>
> a) when viewed as array-of-char:
> at offset 0: u.c[0] (one byte)
> at offset 1: u.c[1] (one byte)
> at offset 2: u.c[2] (one byte)
> at offset 3: <unnamed> (one byte of padding)
> (remember that arrays *must* be contiguous)
>
> b) when viewed as unnamed structure with bitfields:
> storage unit at offset 0: <four bytes wide, unspecified layout>
>
> Part (b) is where the problem occurs. The layout of the bitfields
> within the storage unit is unspecified. Suppose the compiler
> chooses to start at "bit 0" of a 32-bit big-endian word. In this
> case, u.bits.one occupies bits 0 through 5 inclusive of that 32-bit
> word, which live in the unnamed padding byte at offset 3 in the
> array in u.c. u.bits.two occupies bits 6 through 11, which uses
> two more bits in the unnamed byte and then four bits in the array
> element u.c[3]. (At least one Motorola 68020 C compiler handles
> -- or handled -- bitfields this way; I used it in the early Sun
> days.)
>
> As it happens, SPARC compilers in big-endian mode (which is to say
> most of them) are "encouraged" to allocate bitfields starting at
> bit 31 and working down towards bit 0. This means that u.bits.one
> occupies bits 31 through 26 inclusive, which live in the array
> element u.c[0].
>
> On Intel x86 compilers, the situation is reversed: the usual
> convention is to allocate bitfields starting at bit 0 and working
> up towards bit 31. As it happens, Intel x86 CPUs are little-endian
> -- so this means that u.bits.one occupies bits 0 through 5 inclusive,
> which are also bits 0 through 5 of u.c[0]. u.bits.two occupies
> bits 6 through 11, which are in bits 6 and 7 of u.c[0] (for the
> two low-order bits of u.bits.two) and then bits 0 through 3 of
> u.c[1].
>
> Of course, there is nothing but convention to prevent an x86 C
> compiler from allocating bits from 31 down to 0, in which case
> u.c[0] would not map to any of the bits in u.bits. In any case,
> the output on the x86 seems a bit peculiar -- the bits are spread
> over the bytes in an interesting manner:
>
> #include <stdio.h>
>
> int main(void) {
> unsigned char buf[3];
> pack(buf, 0x00, 0x3f, 0x00, 0x00);
> printf("result: buf[0..3] = 0x%2.2x%2.2x%2.2x\n",
> buf[0], buf[1], buf[2]);
> return 0;
> }
>
> produces:
>
> result: buf[0..3] = 0xc00f00
>
> Packing the tuple <0x00, 0x03, 0x00, 0x00> results in 0xc00000, so
> we see that, indeed, buf[0] contains as its two *uppermost* bits
> the two *lowermost* bits of the 2nd value.
>
> (The result on the big-endian SPARC is probably closer to what was
> expected.)


Who says that the size of the struct will be 32bit? The standard just says:
[#9] An implementation may allocate any addressable storage
unit large enough to hold a bit-field. If enough space
remains, a bit-field that immediately follows another bit-
field in a structure shall be packed into adjacent bits of
the same unit. If insufficient space remains, whether a
bit-field that does not fit is put into the next unit or
overlaps adjacent units is implementation-defined. The
order of allocation of bit-fields within a unit (high-order
to low-order or low-order to high-order) is implementation-
defined. The alignment of the addressable storage unit is
unspecified.Additionally you can adjust the structure padding with
compiler options.

union {
unsigned char c[3*4];
struct {
unsigned int bit0:6, bit1:6, bit2:6, bit3:6;
unsigned int bit4:6, bit5:6, bit6:6, bit7:6;
unsigned int bit8:6, bit2:9, bit3:10, bit11:6;
unsigned int bit12:6, bit13:6, bit14:6, bit15:6;
} bits;
} u;

This should solve the problem because the union is now a multiple of 32 bits
in size so there will be no internal padding and byte order won't matter.

--
cody

[Freeware, Games and Humor]
www.deutronium.de.vu || www.deutronium.tk


 
Reply With Quote
 
Chris Torek
Guest
Posts: n/a
 
      10-17-2003
[I wrote, in part:]
>> Now, suppose we have a fairly typical (in today's terms) machine
>> with 8-bit "char"s and 4-byte "int"s and 4-byte "storage units".


In article <bmn2oj$orobt$(E-Mail Removed)-berlin.de>
cody <(E-Mail Removed)> wrote:
> Who says that the size of the struct will be 32bit?


I did, right in the text you quoted (and I retained) above. And
it is -- on today's typical implementations. It does not *have*
to be, but it *is*, right now. (On a few implementations the
storage-units are 16 bits, and on a few more they are 64 bits, and
there are no doubt some oddballs with 24 bits and the like out
there today. But 32 is pretty common at the moment.)

>The standard just says:
> [#9] An implementation may allocate any addressable storage
> unit large enough to hold a bit-field. If enough space
> remains, a bit-field that immediately follows another bit-
> field in a structure shall be packed into adjacent bits of
> the same unit. If insufficient space remains, whether a
> bit-field that does not fit is put into the next unit or
> overlaps adjacent units is implementation-defined. The
> order of allocation of bit-fields within a unit (high-order
> to low-order or low-order to high-order) is implementation-
> defined. The alignment of the addressable storage unit is
> unspecified.


Yes, this is what it says.

>Additionally you can adjust the structure padding with compiler options.


(If they exist.)

>
> union {
> unsigned char c[3*4];
> struct {
> unsigned int bit0:6, bit1:6, bit2:6, bit3:6;
> unsigned int bit4:6, bit5:6, bit6:6, bit7:6;
> unsigned int bit8:6, bit2:9, bit3:10, bit11:6;
> unsigned int bit12:6, bit13:6, bit14:6, bit15:6;
> } bits;
> } u;
>
>This should solve the problem because the union is now a multiple of 32 bits
>in size so there will be no internal padding and byte order won't matter.


Will it? (Assume we have fixed the typos above -- bit2:9 and
bit3:10 should be bit9:6 and bit10:6. I let a compiler find this.)

"If enough space remains, a bit-field that immediately follows
another bit-field in a structure shall be packed into adjacent
bits of the same [storage] unit."

Suppose storage-units are 32 bits wide. Then 6+6+6+6+6 = 30, so
that the fields name "bit0" through "bit4" inclusive are packed
into the first 32-bit storage unit. The field named "bit5" now
needs another 6 bits, which is greater than the remaining space
(2 bits), so the Standard's wording continues:

"If insufficient space remains," -- and this is indeed the
case -- "whether a bit-field that does not fit is put into
the next unit or overlaps adjacent units is implementation-
defined."

So the implementation must document whether "bit5" is split across
two storage units, or the two leftover bits are skipped and then
"bit5" is placed in the next storage-unit.

Suppose the implementation's documentation says that bitfields are
*not* so split. In this case, "bit5" through "bit9" inclusive go
in the 2nd 32-bit storage-unit, "bit10" through "bit14" in the
third, and "bit15" occupies some implementation-defined portion
of the last 32-bit storage unit.

Thus, the "struct ... bits" member will need four 32-bit storage
units, or 16 bytes (remember that we are still assuming a common
8-bit-byte 32-bit-int machine). This is indeed a multiple of 32
bits -- but hardly the desired multiple, as illustrated by the
following program and its output:

#include <stdio.h>

union u {
unsigned char c[3*4];
struct s {
unsigned int bit0:6, bit1:6, bit2:6, bit3:6;
unsigned int bit4:6, bit5:6, bit6:6, bit7:6;
unsigned int bit8:6, bit9:6, bit10:6, bit11:6;
unsigned int bit12:6, bit13:6, bit14:6, bit15:6;
} bits;
};

int main(void) {
printf("sizeof union u = %lu\n",
(unsigned long)sizeof(union u));
printf("sizeof struct s = %lu\n",
(unsigned long)sizeof(struct s));
printf("sizeof unsigned char [3*4] = %lu\n",
(unsigned long)sizeof(unsigned char [3*4]));
return 0;
}

When compiled with "gcc -ansi -pedantic" and run, this prints:

sizeof union u = 16
sizeof struct s = 16
sizeof c[3*4] = 12

which means that the various bitfields were in fact padded out in
just the way I suggested above. The array, of course, has no
padding.

Ultimately, you *can* do what the original poster asked using unions
and bitfields -- but the result is inherently implementation
dependent, because of all those "implementation-defined" items
above; and it is surprisingly hard to get the right answers.
(Dennis Ritchie once called C's bitfields "a botch and a blemish",
and I think this helps illustrate why.) As a general rule, if you
want to do bitwise computations in C, it all works a lot better if
you use the bitwise operators instead of bitfield members of
structures. The results are far easier to predict -- thus easier
to get right -- and you have much more hope of producing portable,
or even just "mostly portable", code.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://67.40.109.61/torek/index.html (for the moment)
Reading email is like searching for food in the garbage, thanks to spammers.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
When using System.IO.FileStream, I write 8 bytes, then seek to the start of the file, does the 8 bytes get flushed on seek and the buffer become a readbuffer at that point instead of being a write buffer? DR ASP .Net 2 07-29-2008 09:50 AM
When using System.IO.FileStream, I write 8 bytes, then seek to the start of the file, does the 8 bytes get flushed on seek and the buffer become a readbuffer at that point instead of being a write buffer? DR ASP .Net Building Controls 0 07-29-2008 01:37 AM
Ratio of Bytes Delayed to Bytes Sent netproj Cisco 0 12-21-2005 08:08 PM
Private Bytes vs. # Bytes in all Heaps in Perfmon Jason Collins ASP .Net 3 02-18-2004 03:59 PM
Re: receiving Bytes and sending Bytes The Old Sourdough Computer Support 0 07-23-2003 01:23 PM



Advertisments