Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Array assignment via struct

Reply
Thread Tools

Array assignment via struct

 
 
Netocrat
Guest
Posts: n/a
 
      08-06-2005
On Sat, 06 Aug 2005 17:08:29 +0000, S.Tobias wrote:

> Netocrat <(E-Mail Removed)> wrote:
>> On Fri, 05 Aug 2005 21:52:07 -0700, Krishanu Debnath wrote:
>>> Jack Klein wrote:

>
>>> <snip>
>>>> > /* Assign array via struct */
>>>> >
>>>> > #include <stdio.h>
>>>> > #include <string.h>
>>>> > #include <stdlib.h>
>>>> >
>>>> > #define LEN 20
>>>> >
>>>> > typedef struct {
>>>> > char a[LEN];
>>>> > } S;
>>>> >
>>>> > int main(void) {
>>>> > S sa;
>>>> > char A[LEN];
>>>> >
>>>> > S *ps = malloc(LEN);

>>
>> It's possible, but unlikely, that sizeof(S) > LEN due to padding. Better
>> to use sizeof(S) than LEN.
>>
>>>> > char *pa = malloc(LEN);
>>>> >
>>>> > strcpy(sa.a, "Joe Wright Rocks");
>>>> > puts(sa.a);
>>>> >
>>>> > *(S*)A = sa;

> [snip]
>>>> Here is where you invoke undefined behavior, since A isn't dynamically
>>>> allocated. There is no guarantee that A meets the alignment
>>>> requirements for an S. The compiler might generate code that assumes
>>>> that A is, causing some sort of trap on some platforms, or possible
>>>> misaligned data or overwriting the destination array.

<snipped two lines above restored>
>>
>> Given that element a must be located at the start of struct S, and that it
>> is a char array of size LEN, it's hard to see how it could be aligned
>> differently to the char array A of size LEN. Are you referring to this
>> specific case or in general? If this case, could you explain how the
>> standard allows the alignments to be different?

>
> Type `char' has no alignment (ie. alignment(char) == 1), of course,
> but at issue is not `char', but rather `char[10]'. Long time ago


Actually Joe's code #define's LEN to 20, you're thinking of the OP.

> (don't ask me for details now) I read that on DEC stations character
> arrays in structs could have different alignments depending on their
> size, so for example `char[15]' could have different alignment than
> `char[31]'. All this was for purpose of memory access speed; ordinarily
> `char[ANY]' doesn't have alignment (at least when ANY is a prime number,
> for others I don't know), but when in a struct, a compiler
> could assume that the array is positioned at a "fast" location and
> generate more optimal code. (BTW, the discussion in which I read it
> was about why struct-hack didn't work.)


Well you've confirmed that it's not merely hypothetical - padding
actually is added in some real-world implementations. So to expand on
Jack's explanation of specific code being generated, perhaps something
like this:

4 padding bytes are added after the array of 20 char in the struct so
that it can be placed on an 8-byte boundary. The compiler generates
code to retrieve the elements of the array 8-bytes at a time and unaligned
access to 8-byte-wide data on this particular implementation is not
allowed.

The automatic char[20] variable A is not aligned on an 8-byte boundary, so
when it's accessed through the struct, unaligned access occurs and our
implementation spits the dummy.

So Joe - no go. Thou code be fraught.

<snip>
>>> may I have C&V for that?

>>
>> There is no constraint violation if that's what you mean.

>
> He meant "Chapter & Verse".


The & did seem a little out of place...

 
Reply With Quote
 
 
 
 
Joe Wright
Guest
Posts: n/a
 
      08-06-2005
Tim Rentsch wrote:
> Joe Wright <(E-Mail Removed)> writes:
>
>
>>All of S is an array of char. What alignment requirements
>>might there be for an S? None. Structures don't have
>>alignment requirements, their members do. What are the
>>alignment requirements of a char array?

>
>
> Structures can have alignment requirements that
> are different from those of their members, and
> here's a possible reason why they would.
>
> On platforms where a 'char *' is a different
> format and/or wider than an 'int *', an
> implementation might choose to make all structs
> be 'int' aligned, so that structure pointers
> would be easier to deal with.
>
> So a structure holding a character array would
> still need 'int' alignment, even though the
> contained character array would need only 'char'
> alignment.
>
> The alignment requirement also implies a sizing
> requirement, since alignment_of(T) must evenly
> divide 'sizeof(T)'. That's why a struct that
> holds only a character array might be bigger
> than the character array it holds.


You just made all that up didn't you?

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
 
Reply With Quote
 
 
 
 
Joe Wright
Guest
Posts: n/a
 
      08-06-2005
Netocrat wrote:
> On Sat, 06 Aug 2005 17:08:29 +0000, S.Tobias wrote:
>
>
>>Netocrat <(E-Mail Removed)> wrote:
>>
>>>On Fri, 05 Aug 2005 21:52:07 -0700, Krishanu Debnath wrote:
>>>
>>>>Jack Klein wrote:

>>
>>>><snip>
>>>>
>>>>>>/* Assign array via struct */
>>>>>>
>>>>>>#include <stdio.h>
>>>>>>#include <string.h>
>>>>>>#include <stdlib.h>
>>>>>>
>>>>>>#define LEN 20
>>>>>>
>>>>>>typedef struct {
>>>>>> char a[LEN];
>>>>>>} S;
>>>>>>
>>>>>>int main(void) {
>>>>>> S sa;
>>>>>> char A[LEN];
>>>>>>
>>>>>> S *ps = malloc(LEN);
>>>
>>>It's possible, but unlikely, that sizeof(S) > LEN due to padding. Better
>>>to use sizeof(S) than LEN.
>>>
>>>
>>>>>> char *pa = malloc(LEN);
>>>>>>
>>>>>> strcpy(sa.a, "Joe Wright Rocks");
>>>>>> puts(sa.a);
>>>>>>
>>>>>> *(S*)A = sa;

>>
>>[snip]
>>
>>>>>Here is where you invoke undefined behavior, since A isn't dynamically
>>>>>allocated. There is no guarantee that A meets the alignment
>>>>>requirements for an S. The compiler might generate code that assumes
>>>>>that A is, causing some sort of trap on some platforms, or possible
>>>>>misaligned data or overwriting the destination array.

>
> <snipped two lines above restored>
>
>>>Given that element a must be located at the start of struct S, and that it
>>>is a char array of size LEN, it's hard to see how it could be aligned
>>>differently to the char array A of size LEN. Are you referring to this
>>>specific case or in general? If this case, could you explain how the
>>>standard allows the alignments to be different?

>>
>>Type `char' has no alignment (ie. alignment(char) == 1), of course,
>>but at issue is not `char', but rather `char[10]'. Long time ago

>
>
> Actually Joe's code #define's LEN to 20, you're thinking of the OP.
>
>
>>(don't ask me for details now) I read that on DEC stations character
>>arrays in structs could have different alignments depending on their
>>size, so for example `char[15]' could have different alignment than
>>`char[31]'. All this was for purpose of memory access speed; ordinarily
>>`char[ANY]' doesn't have alignment (at least when ANY is a prime number,
>>for others I don't know), but when in a struct, a compiler
>>could assume that the array is positioned at a "fast" location and
>>generate more optimal code. (BTW, the discussion in which I read it
>>was about why struct-hack didn't work.)

>
>
> Well you've confirmed that it's not merely hypothetical - padding
> actually is added in some real-world implementations. So to expand on
> Jack's explanation of specific code being generated, perhaps something
> like this:
>
> 4 padding bytes are added after the array of 20 char in the struct so
> that it can be placed on an 8-byte boundary. The compiler generates
> code to retrieve the elements of the array 8-bytes at a time and unaligned
> access to 8-byte-wide data on this particular implementation is not
> allowed.
>
> The automatic char[20] variable A is not aligned on an 8-byte boundary, so
> when it's accessed through the struct, unaligned access occurs and our
> implementation spits the dummy.
>
> So Joe - no go. Thou code be fraught.
>

I think not. Consider..

struct {
char a[17];
} sa;

...and explain any case for sizeof sa not being 17. Annecdotes of long
forgotten DEC Stations don't count.

> <snip>
>
>>>>may I have C&V for that?
>>>
>>>There is no constraint violation if that's what you mean.

>>
>>He meant "Chapter & Verse".

>
>
> The & did seem a little out of place...
>



--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---
 
Reply With Quote
 
Chris Torek
Guest
Posts: n/a
 
      08-06-2005
>Tim Rentsch wrote:
>> On platforms where a 'char *' is a different
>> format and/or wider than an 'int *', an
>> implementation might choose to make all structs
>> be 'int' aligned, so that structure pointers
>> would be easier to deal with.
>>
>> So a structure holding a character array would
>> still need 'int' alignment, even though the
>> contained character array would need only 'char'
>> alignment.
>>
>> The alignment requirement also implies a sizing
>> requirement, since alignment_of(T) must evenly
>> divide 'sizeof(T)'. That's why a struct that
>> holds only a character array might be bigger
>> than the character array it holds.


In article <(E-Mail Removed)>
Joe Wright <(E-Mail Removed)> wrote:
>You just made all that up didn't you?


He may well have made it up. But it was in fact the case on
the Data General MV/10000 (Eclipse), as I recall.

The Eclipse did actually have separate "word pointers" and "byte
pointers". Most pointers were word pointers; "char *" (and thus
"void *", had it existed) used byte pointers. To convert from byte
to word pointer, you shifted right one bit, discarding the byte-offset
and introducing a zero bit at the top (in the "indirect" bit that
appeared only in word pointers). To convert a word pointer to a
byte pointer, you shifted left one bit, discarding the top (indirect)
bit and introducing a zero bit at the bottom -- so that the resulting
byte pointer pointed to the first, even-numbered byte of the two
bytes that made up each word.

This machine exposed an awful lot of code-conformance problems,
even before the C standard existed. We had one at the University
of Maryland in the mid-1980s, before the 1989 C standard came out.
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
 
Reply With Quote
 
Netocrat
Guest
Posts: n/a
 
      08-07-2005
On Sat, 06 Aug 2005 18:22:50 -0400, Joe Wright wrote:
> Netocrat wrote:
>>>>>Jack Klein wrote:
>>>>>>Joe Wright wrote:

<snip>
>>>>>>>#define LEN 20
>>>>>>>
>>>>>>>typedef struct {
>>>>>>> char a[LEN];
>>>>>>>} S;
>>>>>>>
>>>>>>>int main(void) {
>>>>>>> S sa;
>>>>>>> char A[LEN];
>>>>>>>
>>>>>>> S *ps = malloc(LEN);

<snip>
>>>>>>> *(S*)A = sa;
>>>
>>>[snip]
>>>
>>>>>>Here is where you invoke undefined behavior, since A isn't dynamically
>>>>>>allocated. There is no guarantee that A meets the alignment
>>>>>>requirements for an S. The compiler might generate code that assumes
>>>>>>that A is, causing some sort of trap on some platforms, or possible
>>>>>>misaligned data or overwriting the destination array.

<snip>
>> So to expand on
>> Jack's explanation of specific code being generated, perhaps something
>> like this:
>>
>> 4 padding bytes are added after the array of 20 char in the struct so
>> that it can be placed on an 8-byte boundary. The compiler generates
>> code to retrieve the elements of the array 8-bytes at a time and unaligned
>> access to 8-byte-wide data on this particular implementation is not
>> allowed.
>>
>> The automatic char[20] variable A is not aligned on an 8-byte boundary, so
>> when it's accessed through the struct, unaligned access occurs and our
>> implementation spits the dummy.
>>
>> So Joe - no go. Thou code be fraught.


Correction: thy code be fraught. Thou codest flawed source.

> I think not. Consider..
>
> struct {
> char a[17];
> } sa;
>
> ..and explain any case for sizeof sa not being 17. Annecdotes of long
> forgotten DEC Stations don't count.


Why not? Were they not valid C implementation hosts?

That's a contrived choice because being (2 pow 4) + 1 it's impossible to
minimise accesses. Using my example above, consider a 20-byte array
accessed in 8-byte chunks. When aligned on an 8-byte (or 4-byte)
boundary, it will take 3 accesses to read/write the entire array in
8-byte chunks. It will take 4 accesses to do the same when its alignment
is 1, 2 or 3 bytes off an 8-byte alignment. That's a supportable reason
for properly aligning the struct on an 8-byte boundary and hence requiring
4 padding bytes.

As for your 17 byte example, well, this implementation may pad out 7
bytes but more likely it would pad 3 and it would access 4 bytes at a time
on a 4-byte boundary.

Totally hypothetical but for all I know (I don't have a lot of varied
hardware experience) a machine like this does exist. Not the machine that
I work from though (Intel P4) because unaligned access whilst slower is
not an error.

 
Reply With Quote
 
Tim Rentsch
Guest
Posts: n/a
 
      08-07-2005
Joe Wright <(E-Mail Removed)> writes:

> Tim Rentsch wrote:
> > Joe Wright <(E-Mail Removed)> writes:
> >
> >
> >>All of S is an array of char. What alignment requirements
> >>might there be for an S? None. Structures don't have
> >>alignment requirements, their members do. What are the
> >>alignment requirements of a char array?

> >
> >
> > Structures can have alignment requirements that
> > are different from those of their members, and
> > here's a possible reason why they would.
> >
> > On platforms where a 'char *' is a different
> > format and/or wider than an 'int *', an
> > implementation might choose to make all structs
> > be 'int' aligned, so that structure pointers
> > would be easier to deal with.
> >
> > So a structure holding a character array would
> > still need 'int' alignment, even though the
> > contained character array would need only 'char'
> > alignment.
> >
> > The alignment requirement also implies a sizing
> > requirement, since alignment_of(T) must evenly
> > divide 'sizeof(T)'. That's why a struct that
> > holds only a character array might be bigger
> > than the character array it holds.

>
> You just made all that up didn't you?


In fact, I didn't. I read about such platforms here
in comp.lang.c.

For example, consider a machine that addresses 64-bit
words natively. Pointers and ints are both 64 bits,
and use word addresses. A 64-bit word holds 8
eight-bit char's; a pointer to char uses a word
address, but puts the three bits that indicate which
char within the word in the high order bits of the
64-bit pointer. I'm doing this from memory, so I may
have some of the details wrong; however, other people
have written about C implementations on actual machines
that are very much like this.

It would be very natural on such a machine to have
all struct's be multiples of 8 in size, and aligned
on word boundaries.
 
Reply With Quote
 
Chris Croughton
Guest
Posts: n/a
 
      08-07-2005
On Sun, 07 Aug 2005 00:24:58 +1000, Netocrat
<(E-Mail Removed)> wrote:

> On Sat, 06 Aug 2005 08:45:05 -0400, Joe Wright wrote:
>> Netocrat wrote:
>>> On Fri, 05 Aug 2005 21:52:07 -0700, Krishanu Debnath wrote:
>>>>Jack Klein wrote:
>>>><snip>
>>>>
>>>>>>/* Assign array via struct */
>>>>>>
>>>>>>#include <stdio.h>
>>>>>>#include <string.h>
>>>>>>#include <stdlib.h>
>>>>>>
>>>>>>#define LEN 20
>>>>>>
>>>>>>typedef struct {
>>>>>> char a[LEN];
>>>>>>} S;
>>>>>>
>>>>>>int main(void) {
>>>>>> S sa;
>>>>>> char A[LEN];
>>>>>>
>>>>>> S *ps = malloc(LEN);
>>>
>>>
>>> It's possible, but unlikely, that sizeof(S) > LEN due to padding. Better
>>> to use sizeof(S) than LEN.
>>>

>> What padding could there be? S is essentially a char array.

>
> Yeah, that's why I called it a pedantic point later in the post. Probably
> the DS9000 is the only implementation to include padding. Anyhow you lose
> nothing by using sizeof(S) instead of LEN and you are assured of
> compliance.


No, it isn't a pedantic point, there are many systems where a struct is
rounded up in length to the "worst case~ alignment size. In the case
given, it probably won't happen all that often because LEN is 20 which
is a multiple of 4 (although certain 64 bit machines may need alignment
to 8 byte boundaries). If LEN were an odd number a lot of systems would
round the size up to at least the nearest even number.

>>>>>> char *pa = malloc(LEN);
>>>>>>
>>>>>> strcpy(sa.a, "Joe Wright Rocks");
>>>>>> puts(sa.a...);
>>>>>>
>>>>>> *(S*)A = sa;
>>>
>>>
>>> Here you are potentially copying and assigning more than the allocated
>>> (to src and dest) LEN bytes. A compiler might do this for performance
>>> reasons. It's probably unlikely and a little pedantic but the point is
>>> that what you're doing isn't guaranteed safe by the standard.
>>>

>> You're assuming sizeof sa might be greater than LEN. Why?

>
> As above - padding. I wrote that it might be added for performance
> reasons. I don't know if such reasons legitimately exist on a real-world
> implementation (I can contrive a far-fetched hypothetical implementation
> where they do), but you never know what code an optimising compiler is
> going to generate.


Or a non-optimising one. A fully optimising compiler might notice that
the only thing in the structure is a char array, and hence generate a
structure of length LEN, where a non-optimising one would "play it safe"
by making sure that it is rounded up to a safe alignment.

If there are non-char elements in the structure, of course, the size
will always be rounded up to the worst alignment needed by any field in
the structure. This is because it could be used as an array, and the
array accesses must be correctly aligned.

The only portable way to do malloc is to use the sizeof the actual thing
being allocated:

char *pa = malloc(sizeof(S));

or preferably

S *ps = malloc(sizeof(*ps));

Chris C
 
Reply With Quote
 
Lawrence Kirby
Guest
Posts: n/a
 
      08-08-2005
On Sat, 06 Aug 2005 08:02:38 -0400, Joe Wright wrote:

> Jack Klein wrote:
>> On Fri, 05 Aug 2005 16:24:25 -0400, Joe Wright <(E-Mail Removed)>
>> wrote in comp.lang.c:
>>
>>
>>>Lawrence Kirby wrote:


....

>>> *(S*)A = sa;

>>
>>
>> Here is where you invoke undefined behavior, since A isn't dynamically
>> allocated. There is no guarantee that A meets the alignment
>> requirements for an S. The compiler might generate code that assumes
>> that A is, causing some sort of trap on some platforms, or possible
>> misaligned data or overwriting the destination array.
>>

> All of S is an array of char. What alignment requirements might there be
> for an S? None. Structures don't have alignment requirements, their
> members do. What are the alignment requirements of a char array?


Any object type can have alignment requirements. A structure's alignment
requirements must meet the requirements of all of its members, but there's
nothing to stop it being stricter. The reason for doing this is speed,
word aligned access can be faster even for smaller objects. Consider for
example optimised strcpy() memcpy() etc. code that operates a word at a
time.


>> I strongly dislike people who write code like this. Especially if I
>> have to clean up after the 'clever' programmer. It would never pass a
>> code inspection at any shop with decent standards. Shops that don't
>> do code inspections don't have decent standards by definition.
>>

> You 'strongly dislike people' who try to get 'clever' with C in a
> newsgroup posting? Boy, are you tough.


When the "clever" method is obscure and possibly wrong (or not easy to
prove correct) and there is "dumb" simple, clear and correct method
available I'd have to agree.

> I thought you'd get me for not checking the malloc() returns and not
> free()ing ps and pa before exit. You never know your luck.


There's that too.

Lawrence
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      08-08-2005
Chris Croughton <(E-Mail Removed)> writes:
> On Sun, 07 Aug 2005 00:24:58 +1000, Netocrat
> <(E-Mail Removed)> wrote:

[snip]
>> Yeah, that's why I called it a pedantic point later in the post. Probably
>> the DS9000 is the only implementation to include padding. Anyhow you lose
>> nothing by using sizeof(S) instead of LEN and you are assured of
>> compliance.

>
> No, it isn't a pedantic point, there are many systems where a struct is
> rounded up in length to the "worst case~ alignment size. In the case
> given, it probably won't happen all that often because LEN is 20 which
> is a multiple of 4 (although certain 64 bit machines may need alignment
> to 8 byte boundaries). If LEN were an odd number a lot of systems would
> round the size up to at least the nearest even number.


For example, given:

struct foo {
char s[3];
};

it would make sense on many platforms to pad struct foo to 4 bytes and
require 4-byte alignment. That way, assigning a struct foo or passing
it as an argument could be done with a single 4-byte instruction, just
as for a 32-bit (assuming CHAR_BIT== integer.

On the other hand, an implementer might decide that copying structures
is rare enough that the extra padding isn't worthwhile. <OT>gcc
doesn't add extra padding, at least by default, at least on the one
platform where I tried this.</OT>

[snip]

>> As above - padding. I wrote that it might be added for performance
>> reasons. I don't know if such reasons legitimately exist on a real-world
>> implementation (I can contrive a far-fetched hypothetical implementation
>> where they do), but you never know what code an optimising compiler is
>> going to generate.

>
> Or a non-optimising one. A fully optimising compiler might notice that
> the only thing in the structure is a char array, and hence generate a
> structure of length LEN, where a non-optimising one would "play it safe"
> by making sure that it is rounded up to a safe alignment.


But note that the non-optimizing and fully optimizing compilers in
practice probably can't be the same compiler in different modes.
Given the way most compilers are invoked, you usually want to have the
same data layout in all modes, since a program can be built from
translation units that were compiled in different modes. (Or the
linker can forbid linking units compiled in different modes, but that
makes things more complicated.)

--
Keith Thompson (The_Other_Keith) http://www.velocityreviews.com/forums/(E-Mail Removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      08-08-2005
Tim Rentsch <(E-Mail Removed)> writes:
> Joe Wright <(E-Mail Removed)> writes:

[snip]
>> You just made all that up didn't you?

>
> In fact, I didn't. I read about such platforms here
> in comp.lang.c.
>
> For example, consider a machine that addresses 64-bit
> words natively. Pointers and ints are both 64 bits,
> and use word addresses. A 64-bit word holds 8
> eight-bit char's; a pointer to char uses a word
> address, but puts the three bits that indicate which
> char within the word in the high order bits of the
> 64-bit pointer. I'm doing this from memory, so I may
> have some of the details wrong; however, other people
> have written about C implementations on actual machines
> that are very much like this.


Yes, Cray vector machines (at least the ones I've used) are like that.

> It would be very natural on such a machine to have
> all struct's be multiples of 8 in size, and aligned
> on word boundaries.


In fact, I just tried the following program on a Cray Y-MP:

#include <stdio.h>
int main(void)
{
struct foo {
char s[3];
};
printf("sizeof(struct foo) = %d\n", (int)sizeof(struct foo));
return 0;
}

The output was:

sizeof(struct foo) = 8

--
Keith Thompson (The_Other_Keith) (E-Mail Removed) <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Can *common* struct-members of 2 different struct-types, that are thesame for the first common members, be accessed via pointer cast to either struct-type? John Reye C Programming 28 05-08-2012 12:24 AM
Rationale for struct assignment and no struct comparison Noob C Programming 25 12-09-2009 08:56 AM
Class::Struct array assignment and access nelson331 Perl Misc 3 04-29-2006 09:37 PM
length of an array in a struct in an array of structs in a struct in an array of structs Tuan Bui Perl Misc 14 07-29-2005 02:39 PM
struct my_struct *p = (struct my_struct *)malloc(sizeof(struct my_struct)); Chris Fogelklou C Programming 36 04-20-2004 08:27 AM



Advertisments