Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Re: Structure with unsigned chars and internal alignment

Reply
Thread Tools

Re: Structure with unsigned chars and internal alignment

 
 
Eric Sosman
Guest
Posts: n/a
 
      08-31-2011
On 8/30/2011 6:05 PM, pozz wrote:
> I have a struct composed by two arrays of unsigned char.
>
> struct myStruct {
> unsigned char field1[2];
> unsigned char field2[30];
> };
>
> Is myStruct *always* 32 bytes long? Is field2 *always* starting after
> two bytes the pointer to myStruct (i.e., no padding is allowed between
> field1 and field2)?


No and no. Many compilers will lay out the struct as you hope,
but none are under any obligation to do so. If you want thirty-two
consecutive bytes, consider `unsigned char field_both[32]'.

> In this case, in my application I'd like to read the size of myStruct
> (between 3 and 32) from a file. field1 will be always 2-bytes long,
> field2 will be the size of myStruct minus 2 bytes of field1 (in the case
> no padding is present between field1 and field2).


You probably do *not* want to read "the size of myStruct" from
the file; you want to read "the size of some blob of data." The two
environments (on-disk form and in-memory form) are not necessarily
identical, even if they're strongly related by intention.

I said "probably," because perhaps your file actually does hold
"the size of myStruct." This could be the case if an actual `struct
myStruct' was written to the file originally, complete with whatever
padding it might have included. If you never, never need to move the
data to another system (not even for post-mortem analysis), you can
probably get away with this.

> I could allocate field2 array dynamically, but I haven't malloc/free on
> my embedded platform. So I decided to statically allocate the biggest
> size (32 bytes, 2 bytes for field1 and 30 bytes for field2).


It's not clear that the presence or absence of malloc() has
anything to do with the presence or absence of padding.

> If the real size of myStruct, read from the configuration file, is
> struct_size, how can I deduce the size of field2? Actually I use the
> following formula:
>
> field2_size = struct_size - 2


I guess `struct_size' is something you compute from the two-byte
array? Well, it matters not: The validity of the formula depends not
on C but on the program that wrote the file in the first place. What
formula did *that* program use?

(A literal reading of your question leads to the answer "The size
of field2 is thirty, always." This may sound nit-picky, but I have a
hunch that if you think about it hard enough you'll arrive at the
question you *should* be asking instead.)

> but I don't like it. It would be wrong if I'll decide to change the size
> of field1 member, and it would be wrong if padding is present between
> the two fields. Maybe the following is better?
>
> field2_size = struct_size - offsetof(struct myStruct, field2)


Same problem: It could be right or wrong (or "not even wrong"),
because what you need to care about is who wrote the file and how.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)d
 
Reply With Quote
 
 
 
 
Eric Sosman
Guest
Posts: n/a
 
      09-01-2011
On 8/31/2011 3:07 AM, pozz wrote:
> Il 31/08/2011 03:39, Eric Sosman ha scritto:
>>> In this case, in my application I'd like to read the size of myStruct
>>> (between 3 and 32) from a file. field1 will be always 2-bytes long,
>>> field2 will be the size of myStruct minus 2 bytes of field1 (in the case
>>> no padding is present between field1 and field2).

>>
>> You probably do *not* want to read "the size of myStruct" from
>> the file; you want to read "the size of some blob of data." The two
>> environments (on-disk form and in-memory form) are not necessarily
>> identical, even if they're strongly related by intention.

>
> This is my case.
>
>
>> I said "probably," because perhaps your file actually does hold
>> "the size of myStruct." This could be the case if an actual `struct
>> myStruct' was written to the file originally, complete with whatever
>> padding it might have included. If you never, never need to move the
>> data to another system (not even for post-mortem analysis), you can
>> probably get away with this.

>
> I understand your point and I expaling what I'm trying to do.


I'm not so sure you understand my points. For myself, I'm
*sure* I don't understand "expaling."

> I have to read a file, created by another application on another
> platform, that is composed by blocks of data (what you named "blob of
> data"). The size of these blocks (between 3 and 32) is written in the
> same file at the beginning.
> A single block is composed by 2 bytes and (block_size - 2) bytes.


You haven't said so, but I guess that the initial two bytes
somehow encode `block_size'. Whether the remaining bytes are all
"payload" or may themselves include padding may be known to you, but
remains a mystery to the rest of us.

> Because I don't know the size of blocks I'll read and I can't malloc the
> right size at run-time, I was trying to define the maximum size of
> block, splitting it in the two fields:


Again the hangup over the absence of malloc(). If this has
anything at all to do with the problem, it has to do with some aspect
of the problem that you have not yet revealed. Based on what you've
said and shown, the existence or non-existence of malloc() has zilch
to do with the matter.

> struct myStruct {
> unsigned char field1[2];
> unsigned char field2[30];
> };


... but here comes the "splitting it in the two fields" part,
which you're going about (as several people have told you) in an
unreliable way.

> I thought I could have read the block and copy it directly to myStruct.
> Anyway if padding could be present in myStruct, I can't use this approach.
>
> Maybe the best approach is:
>
> #define FIELD1_OFFSET 0
> #define FIELD1_SIZE 2
> #define FIELD2_OFFSET FIELD1_SIZE
> #define BLOCK_MAXSIZE 32
> void read_block(struct myStruct *s, size_t block_size) {
> unsigned char block[BLOCK_MAXSIZE];
> <read BLOCK_MAXSIZE bytes and copy it in block array>
> memcpy(s->field1, &block[FIELD1_OFFSET], FIELD1_SIZE);
> memcpy(s->field2, &block[FIELD2_OFFSET], block_size - FIELD1_SIZE);
> }
>
> Here block_size is passed as an argument, because I don't know it in
> advance.


That's odd. Where do you learn `block_size', *before* reading
the first two bytes of your blob? And if `block_size' turns out to
be less than thirty-two, how does your "read BLOCK_MAXSIZE bytes"
avoid running off the end and into whatever follows the blob?

Observation: It is premature to seek the "best" way to do
something when you have not yet come up with "any" way. That's
premature optimization personified.

--
Eric Sosman
(E-Mail Removed)d
 
Reply With Quote
 
 
 
 
BartC
Guest
Posts: n/a
 
      09-01-2011
"pozz" <(E-Mail Removed)> wrote in message
news:j3n70j$qrf$(E-Mail Removed)...

> Just to better explain the content of the file:
>
> - 2 bytes that code the size N of all the subsequent blocks
> - N bytes for block 1
> - N bytes for block 2
> - ...up to the end of the file
>
> A single N-bytes block is composed by two field:
> - 2 bytes for field1
> - N-2 bytes for field2
>
> field1 and field2 are application data that aren't important now for our
> discussion.


So why is it necessary to read each block of N bytes in one go?

Why not read 2 bytes into field1, then N-2 bytes into field2? Then problems
of padding and alignment will disappear.

(And field1 does sound very much sound like a 16-bit numeric value; if it
is, it might as well be declared as one, making it easier to work with.)

--
Bartc

 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      09-01-2011
"BartC" <(E-Mail Removed)> writes:
> "pozz" <(E-Mail Removed)> wrote in message
> news:j3n70j$qrf$(E-Mail Removed)...
>
>> Just to better explain the content of the file:
>>
>> - 2 bytes that code the size N of all the subsequent blocks
>> - N bytes for block 1
>> - N bytes for block 2
>> - ...up to the end of the file
>>
>> A single N-bytes block is composed by two field:
>> - 2 bytes for field1
>> - N-2 bytes for field2
>>
>> field1 and field2 are application data that aren't important now for our
>> discussion.

>
> So why is it necessary to read each block of N bytes in one go?
>
> Why not read 2 bytes into field1, then N-2 bytes into field2? Then problems
> of padding and alignment will disappear.


Those problems also disappear if you just read each block into an N-byte
array.

> (And field1 does sound very much sound like a 16-bit numeric value; if it
> is, it might as well be declared as one, making it easier to work with.)


It's stored in big-endian format, so you can't just read it directly
into a 16-bit integer object (unless you use htons() and/or ntohs()).

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      09-01-2011
pozz <(E-Mail Removed)> writes:
[...]
> I think I understood your point of view. Data in the file can be of two
> types:
> - blob of data of exactly N bytes sized
> - myStruct previously written to the file by the same software
> on the same platform (so with the same layout of padding and data)
> I'm in the first case. I know the file contains a sequence of blobs of
> the same size N. N is constant for a file, but may vary from file to
> file. So the software should be ready to read blobs of 10 or 15 or 30
> or 32 bytes.
> How the software can know the size of blobs in the file? There are two
> bytes at the beginning of the file (just one time and *not* for each
> blob) coded as a 16-bits unsigned integer in Big Endian.


[...]

> Just to better explain the content of the file:
>
> - 2 bytes that code the size N of all the subsequent blocks
> - N bytes for block 1
> - N bytes for block 2
> - ...up to the end of the file
>
> A single N-bytes block is composed by two field:
> - 2 bytes for field1
> - N-2 bytes for field2
>
> field1 and field2 are application data that aren't important now for our
> discussion.


Given that the maximum size is only 32 bytes, I probably wouldn't
use malloc() even if it were available; the overhead is likely to
exceed any savings from allocating, say, 24 bytes rather than 32.

Here's how I'd approach it:

Read 2 bytes from the file.
Compute N ((byte0 << + byte1)
loop
Read N bytes into a 32-byte buffer (unsigned char buf[32]
Bytes 0..1 contain field1
Bytes 2..N-1 contain field2

(I've omitted any error checking.)

If you need the data in a friendlier format than a 32-byte buffer, you
can copy it out of buf into whatever is more convenient.

If you're *extremely* short on available memory, you can read
2 bytes directly into field1 and N-2 bytes directly into field2
(this saves you the 32-byte buffer). Assuming field1 represents
a 16-bit integer, don't forget about byte ordering.

Reading directly into a struct is a bad idea if you care about
portability (including to future versions of the same compiler).
If reading directly into a struct turns out to be the best approach
anyway, I'd add some asserts to ensure that the sizes and offsets
of field1 and field2 are what they need to be, so the program will
fail to run rather than operate incorrectly if the layout changes.

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
NicStevens
Guest
Posts: n/a
 
      09-12-2011
On Aug 30, 6:39*pm, Eric Sosman <(E-Mail Removed)> wrote:
> On 8/30/2011 6:05 PM, pozz wrote:
>
> > I have a struct composed by two arrays of unsigned char.

>
> > struct myStruct {
> > unsigned char field1[2];
> > unsigned char field2[30];
> > };

>
> > Is myStruct *always* 32 bytes long? Is field2 *always* starting after
> > two bytes the pointer to myStruct (i.e., no padding is allowed between
> > field1 and field2)?

>
> * * *No and no. *Many compilers will lay out the struct as you hope,
> but none are under any obligation to do so. *If you want thirty-two
> consecutive bytes, consider `unsigned char field_both[32]'.
>
> > In this case, in my application I'd like to read the size of myStruct
> > (between 3 and 32) from a file. field1 will be always 2-bytes long,
> > field2 will be the size of myStruct minus 2 bytes of field1 (in the case
> > no padding is present between field1 and field2).

>
> * * *You probably do *not* want to read "the size of myStruct" from
> the file; you want to read "the size of some blob of data." *The two
> environments (on-disk form and in-memory form) are not necessarily
> identical, even if they're strongly related by intention.
>
> * * *I said "probably," because perhaps your file actually does hold
> "the size of myStruct." *This could be the case if an actual `struct
> myStruct' was written to the file originally, complete with whatever
> padding it might have included. *If you never, never need to move the
> data to another system (not even for post-mortem analysis), you can
> probably get away with this.
>
> > I could allocate field2 array dynamically, but I haven't malloc/free on
> > my embedded platform. So I decided to statically allocate the biggest
> > size (32 bytes, 2 bytes for field1 and 30 bytes for field2).

>
> * * *It's not clear that the presence or absence of malloc() has
> anything to do with the presence or absence of padding.
>
> > If the real size of myStruct, read from the configuration file, is
> > struct_size, how can I deduce the size of field2? Actually I use the
> > following formula:

>
> > field2_size = struct_size - 2

>
> * * *I guess `struct_size' is something you compute from the two-byte
> array? *Well, it matters not: The validity of the formula depends not
> on C but on the program that wrote the file in the first place. *What
> formula did *that* program use?
>
> * * *(A literal reading of your question leads to the answer "The size
> of field2 is thirty, always." *This may sound nit-picky, but I have a
> hunch that if you think about it hard enough you'll arrive at the
> question you *should* be asking instead.)
>
> > but I don't like it. It would be wrong if I'll decide to change the size
> > of field1 member, and it would be wrong if padding is present between
> > the two fields. Maybe the following is better?

>
> > field2_size = struct_size - offsetof(struct myStruct, field2)

>
> * * *Same problem: It could be right or wrong (or "not even wrong"),
> because what you need to care about is who wrote the file and how.
>
> --
> Eric Sosman
> (E-Mail Removed)


may I ask what compiler offsetof works on? I tried gcc and msc and
both complain about offsetof(struct foo, bar)
 
Reply With Quote
 
J. J. Farrell
Guest
Posts: n/a
 
      09-12-2011
NicStevens wrote:
>
> may I ask what compiler offsetof works on? I tried gcc and msc and
> both complain about offsetof(struct foo, bar)


Every compiler which implements Standard C. It's a macro in <stddef.h>.
 
Reply With Quote
 
James Kuyper
Guest
Posts: n/a
 
      09-12-2011
On 09/12/2011 03:09 PM, NicStevens wrote:
> On Aug 30, 6:39 pm, Eric Sosman <(E-Mail Removed)> wrote:
>> On 8/30/2011 6:05 PM, pozz wrote:
>>
>>> I have a struct composed by two arrays of unsigned char.

>>
>>> struct myStruct {
>>> unsigned char field1[2];
>>> unsigned char field2[30];
>>> };

....
>>> field2_size = struct_size - offsetof(struct myStruct, field2)

>>
>> Same problem: It could be right or wrong (or "not even wrong"),
>> because what you need to care about is who wrote the file and how.

....
> may I ask what compiler offsetof works on? I tried gcc and msc and
> both complain about offsetof(struct foo, bar)


The following code should compile without complaint on any hosted
implementation of C which fully conforms to any version of the C
standard. I can't test it on msc, but it does compile without any
complaints with gcc, gcc -std=c89, and gcc -std=c99. I'd be surprised
at anything that dares to call itself a C compiler which would complain
about it:

#include <stddef.h>
struct foo { int bar; };
int main(void)
{
return offsetof(struct foo, bar);
}

If you're running into problems with offsetof(struct foo, bar), it's
probably not due to the offsetof() expression itself, but it's
connection with the rest of your program. For instance, did you remember
to #include <stddef.h>?
If that's not the problem, can you provide an complete, short, program
that demonstrates the problem, along with the command line that you used
to compile it, and the message that was generated complaining about it?


 
Reply With Quote
 
NicStevens
Guest
Posts: n/a
 
      09-14-2011
On Sep 12, 12:32*pm, James Kuyper <(E-Mail Removed)> wrote:
> On 09/12/2011 03:09 PM, NicStevens wrote:
>
>
>
>
>
>
>
>
>
> > On Aug 30, 6:39 pm, Eric Sosman <(E-Mail Removed)> wrote:
> >> On 8/30/2011 6:05 PM, pozz wrote:

>
> >>> I have a struct composed by two arrays of unsigned char.

>
> >>> struct myStruct {
> >>> unsigned char field1[2];
> >>> unsigned char field2[30];
> >>> };

> ...
> >>> field2_size = struct_size - offsetof(struct myStruct, field2)

>
> >> * * *Same problem: It could be right or wrong (or "not even wrong"),
> >> because what you need to care about is who wrote the file and how.

> ...
> > may I ask what compiler offsetof works on? I tried gcc and msc and
> > both complain about offsetof(struct foo, bar)

>
> The following code should compile without complaint on any hosted
> implementation of C which fully conforms to any version of the C
> standard. I can't test it on msc, but it does compile without any
> complaints *with gcc, gcc -std=c89, and gcc -std=c99. I'd be surprised
> at anything that dares to call itself a C compiler which would complain
> about it:
>
> * * * * #include <stddef.h>
> * * * * struct foo { int bar; };
> * * * * int main(void)
> * * * * {
> * * * * * * return offsetof(struct foo, bar);
> * * * * }
>
> If you're running into problems with offsetof(struct foo, bar), it's
> probably not due to the offsetof() expression itself, but it's
> connection with the rest of your program. For instance, did you remember
> to #include <stddef.h>?
> If that's not the problem, can you provide an complete, short, program
> that demonstrates the problem, along with the command line that you used
> to compile it, and the message that was generated complaining about it?


Works on MSC
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
(int) -> (unsigned) -> (int) or (unsigned) -> (int) -> (unsigned):I'll loose something? pozz C Programming 12 03-20-2011 11:32 PM
How to truncate char string fromt beginning and replace chars instring by other chars in C or C++? Hongyu C++ 9 08-08-2008 12:18 PM
Floats to chars and chars to floats Kosio C Programming 44 09-23-2005 09:49 AM
receiving ??? chars instead of "special" chars M.Posseth ASP .Net Web Services 3 11-16-2004 07:00 PM
How to print a string of unsigned chars? RHNewBie C Programming 3 11-15-2003 01:55 AM



Advertisments