Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   One for the language lawyers (http://www.velocityreviews.com/forums/t619318-one-for-the-language-lawyers.html)

Kenny McCormack 06-09-2008 05:08 PM

One for the language lawyers
 
Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?

/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */

struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
/* Now access the members of the struct (using, e.g., bar -> field1).
* Note that no actual struct was ever declared - we are using
* buffer as if it were the struct */
}


Harald van Dijk 06-09-2008 05:29 PM

Re: One for the language lawyers
 
On Mon, 09 Jun 2008 17:08:20 +0000, Kenny McCormack wrote:
> Here is a commonly used technique,


It is? Where have you seen it used?

> that will, of course, work fine on
> any reasonably modern, normal hardware. But, does it pass the CLC test?


No.

> /* Assume well-formed input - of course, you can always break it by
> * feeding it bad input */
>
> struct foo { int field1, field2; char nl; } *bar;


What's the nl member for?

> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
>
> int main(void) {
> bar = (struct foo *) buffer;


This assumes that buffer is appropriately aligned for a struct foo. When
you access *bar, you also ignore C's aliasing rules. Both problems can be
avoided by using a union.

> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);


Did you mean fread, or were you really asking about fgets? If you meant
fread, I don't see the point of a nl member at all. If you meant fgets, I
don't see the point of a nl member at the very end.

> /* Now access the members of the struct (using, e.g., bar -> field1).
> * Note that no actual struct was ever declared - we are using
> * buffer as if it were the struct */
> }


Walter Roberson 06-09-2008 05:31 PM

Re: One for the language lawyers
 
In article <g2jo24$ilh$1@news.xmission.com>,
Kenny McCormack <gazelle@xmission.xmission.com> wrote:
>Here is a commonly used technique, that will, of course, work fine on
>any reasonably modern, normal hardware. But, does it pass the CLC test?


>/* Assume well-formed input - of course, you can always break it by
> * feeding it bad input */


>struct foo { int field1, field2; char nl; } *bar;
>char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];


>int main(void) {
> bar = (struct foo *) buffer;
> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
> /* Now access the members of the struct (using, e.g., bar -> field1).
> * Note that no actual struct was ever declared - we are using
> * buffer as if it were the struct */
> }


There may be unnamed padding between struct members for any reason,
so unless the data being read from stdin via fgets was written
with exactly the same compiler version on exactly the same target,
the code is not certain to work.

Some of the compilers I use *do* put unnamed padding in places
where it is not obvious to do so, in order to achive better caching
performance.


--
"Any sufficiently advanced bug is indistinguishable from a feature."
-- Rich Kulawiec

Jens Thoms Toerring 06-09-2008 05:35 PM

Re: One for the language lawyers
 
Kenny McCormack <gazelle@xmission.xmission.com> wrote:
> Here is a commonly used technique, that will, of course, work fine on
> any reasonably modern, normal hardware. But, does it pass the CLC test?


> /* Assume well-formed input - of course, you can always break it by
> * feeding it bad input */


> struct foo { int field1, field2; char nl; } *bar;
> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];


> int main(void) {
> bar = (struct foo *) buffer;
> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
> /* Now access the members of the struct (using, e.g., bar -> field1).
> * Note that no actual struct was ever declared - we are using
> * buffer as if it were the struct */
> }


As long as sizeof(struct foo) isn't smaller than
SOMENUMBERWHATEVERFLOATSYOURBOAT then there's no problem.
It's rather obfuscated and I dare to doubt that this is
a "commonly used technique", but 'buffer' is memory
you own so you can do with it whatever you want. Of
course, all hinges on your primary assuption that the
input is well-formed (it may be difficult to make it
non-well-formed for the types of members the structure
has on main-stream hardware, but there might be some
systems where certain bit-patterns don't represent ints
and thus you may run into danger of undefined behaviour).
So figuring out what's well-formed can be a bit of a
bother but as long as you do that there's no problem.

Regards, Jens
--
\ Jens Thoms Toerring ___ jt@toerring.de
\__________________________ http://toerring.de

Hallvard B Furuseth 06-09-2008 05:45 PM

Re: One for the language lawyers
 
Kenny McCormack writes:
> Here is a commonly used technique, (...)


I hope not.

> struct foo { int field1, field2; char nl; } *bar;
> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
>
> int main(void) {
> bar = (struct foo *) buffer;
> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
> /* Now access the members of the struct (using, e.g., bar -> field1).


This breaks e.g. if there is a 0x10 byte (newline) in the integer
representation of the would-be bar->field1 value. And as Harald
said, it breaks if buffer is not properly aligned for a struct foo.

Also when I see fgets() I suspect the file has been opened in text
instead of binary mode, which means there may be bugs from converting
between newline and the file system's representation of end-of-line.

--
Hallvard

Chris Torek 06-09-2008 10:30 PM

Re: One for the language lawyers
 
>Kenny McCormack <gazelle@xmission.xmission.com> wrote:
>> Here is a commonly used technique, that will, of course, work fine on
>> any reasonably modern, normal hardware. But, does it pass the CLC test?

>
>> /* Assume well-formed input - of course, you can always break it by
>> * feeding it bad input */
>> struct foo { int field1, field2; char nl; } *bar;
>> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

>
>> int main(void) {
>> bar = (struct foo *) buffer;
>> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
>> /* Now access the members of the struct (using, e.g., bar -> field1).
>> * Note that no actual struct was ever declared - we are using
>> * buffer as if it were the struct */
>> }


In article <6b57voF399cfmU1@mid.uni-berlin.de>,
Jens Thoms Toerring <jt@toerring.de> wrote:
>As long as sizeof(struct foo) isn't smaller than
>SOMENUMBERWHATEVERFLOATSYOURBOAT then there's no problem.


When I first built the 4.xBSD system for the SPARC, tftp broke,
precisely because it used this kind of trick. (In tftp's case,
it was a more complex variant of the "struct hack".)

>It's rather obfuscated and I dare to doubt that this is
>a "commonly used technique", but 'buffer' is memory
>you own so you can do with it whatever you want. Of
>course, all hinges on your primary assuption that the
>input is well-formed ...


More importantly, it depends on the variable "buffer" being
properly aligned for all member accesses.

This was not true on the SPARC, where the compiler put the
big buffer on an odd byte boundary.

As a quick fix, I wrapped the buffer up into a union, which
forced gcc to align the entire thing on an appropriate boundary.

The trick also works if you use malloc() to obtain the buffer.

In any case, it is not a very good idea to write the code this way,
because it places such strong constraints on what constitutes "well
formed" input. You need to make sure that these severe restrictions
on whatever uses the code are paid-for by whatever benefit you are
getting from this "commonly used technique" (which, in my experience,
was used perhaps once in the entire 4.xBSD code base -- that seems
to argue against the claim that it is "commonly used").
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: gmail (figure it out) http://web.torek.net/torek/index.html

Jens Thoms Toerring 06-09-2008 10:46 PM

Re: One for the language lawyers
 
Chris Torek <nospam@torek.net> wrote:
> >Kenny McCormack <gazelle@xmission.xmission.com> wrote:
> >> Here is a commonly used technique, that will, of course, work fine on
> >> any reasonably modern, normal hardware. But, does it pass the CLC test?

> >
> >> /* Assume well-formed input - of course, you can always break it by
> >> * feeding it bad input */
> >> struct foo { int field1, field2; char nl; } *bar;
> >> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

> >
> >> int main(void) {
> >> bar = (struct foo *) buffer;
> >> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
> >> /* Now access the members of the struct (using, e.g., bar -> field1).
> >> * Note that no actual struct was ever declared - we are using
> >> * buffer as if it were the struct */
> >> }


> In article <6b57voF399cfmU1@mid.uni-berlin.de>,
> Jens Thoms Toerring <jt@toerring.de> wrote:
> >As long as sizeof(struct foo) isn't smaller than
> >SOMENUMBERWHATEVERFLOATSYOURBOAT then there's no problem.


> When I first built the 4.xBSD system for the SPARC, tftp broke,
> precisely because it used this kind of trick. (In tftp's case,
> it was a more complex variant of the "struct hack".)


> >It's rather obfuscated and I dare to doubt that this is
> >a "commonly used technique", but 'buffer' is memory
> >you own so you can do with it whatever you want. Of
> >course, all hinges on your primary assuption that the
> >input is well-formed ...


> More importantly, it depends on the variable "buffer" being
> properly aligned for all member accesses.


> This was not true on the SPARC, where the compiler put the
> big buffer on an odd byte boundary.


Yes, that's a point I forgot about. Should have known better,
being bitten more than once by this issue when trying to port
(mostly other peoples;-) code to a different architecture. I
guess I am not too good a language lawyer;-)

Best regards, Jens
--
\ Jens Thoms Toerring ___ jt@toerring.de
\__________________________ http://toerring.de

rahul 06-10-2008 04:30 AM

Re: One for the language lawyers
 
On Jun 10, 3:30 am, Chris Torek <nos...@torek.net> wrote:
>
> As a quick fix, I wrapped the buffer up into a union, which
> forced gcc to align the entire thing on an appropriate boundary.


A bit off the topic:

We can also use compiler specific extensions to achieve the alignment
and padding
requirements. In case of gcc, __attribute__((packed)) for eliminating
padding for structures.
We can also use aligned attributes for buffer to coerce the alignment.

Nick Keighley 06-10-2008 08:54 AM

Re: One for the language lawyers
 
On 9 Jun, 18:08, gaze...@xmission.xmission.com (Kenny McCormack)
wrote:

> Here is a commonly used technique, that will, of course, work fine on
> any reasonably modern, normal hardware. *But, does it pass the CLC test?
>
> /* Assume well-formed input - of course, you can always break it by
> ** feeding it bad input */
>
> struct foo { int field1, field2; char nl; } *bar;
> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
>
> int main(void) {
> * * bar = (struct foo *) buffer;
> * * fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
> * * /* Now access the members of the struct (using, e.g., bar -> field1).
> * * ** Note that no actual struct was ever declared - we are using
> * * ** buffer as if it were the struct */
> * * }


I used it on real systems. Now it makes me nervous.
I've seen a system break when an OS was upgraded
due to this.

To use this I'd want to be *very* sure there was an
identical system at both ends. And always would be.


--
Nick Keighley

Nick Keighley 06-10-2008 08:58 AM

Re: One for the language lawyers
 
On 10 Jun, 05:30, rahul <rahulsin...@gmail.com> wrote:
> On Jun 10, 3:30 am, Chris Torek <nos...@torek.net> wrote:
>
>
>
> > As a quick fix, I wrapped the buffer up into a union, which
> > forced gcc to align the entire thing on an appropriate boundary.

>
> A bit off the topic:
>
> We can also use compiler specific extensions to achieve the alignment
> and padding
> requirements. In case of gcc, __attribute__((packed)) for eliminating
> padding for structures.
> We can also use aligned attributes for buffer to coerce the alignment.


eek!!! These things are different on every compiler. And sometimes
don't exist. Some hardware cannot support it (or it becomes *very*
ineffceint).

I worked on systems that turned it on and off for
each structure in a large header...

I've hunted bugs when different packed/not packed options
had been used in different object files. It *linked* fine.

--
Nick Keighley

"Almost every species in the universe has an irrational fear of
#pragma packed. But they're wrong"


All times are GMT. The time now is 11:06 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.