Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > One for the language lawyers

Reply
Thread Tools

One for the language lawyers

 
 
Kenny McCormack
Guest
Posts: n/a
 
      06-09-2008
Here is a commonly used technique, that will, of course, work fine on
any reasonably modern, normal hardware. But, does it pass the CLC test?

/* Assume well-formed input - of course, you can always break it by
* feeding it bad input */

struct foo { int field1, field2; char nl; } *bar;
char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

int main(void) {
bar = (struct foo *) buffer;
fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
/* Now access the members of the struct (using, e.g., bar -> field1).
* Note that no actual struct was ever declared - we are using
* buffer as if it were the struct */
}

 
Reply With Quote
 
 
 
 
Harald van Dijk
Guest
Posts: n/a
 
      06-09-2008
On Mon, 09 Jun 2008 17:08:20 +0000, Kenny McCormack wrote:
> Here is a commonly used technique,


It is? Where have you seen it used?

> that will, of course, work fine on
> any reasonably modern, normal hardware. But, does it pass the CLC test?


No.

> /* Assume well-formed input - of course, you can always break it by
> * feeding it bad input */
>
> struct foo { int field1, field2; char nl; } *bar;


What's the nl member for?

> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
>
> int main(void) {
> bar = (struct foo *) buffer;


This assumes that buffer is appropriately aligned for a struct foo. When
you access *bar, you also ignore C's aliasing rules. Both problems can be
avoided by using a union.

> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);


Did you mean fread, or were you really asking about fgets? If you meant
fread, I don't see the point of a nl member at all. If you meant fgets, I
don't see the point of a nl member at the very end.

> /* Now access the members of the struct (using, e.g., bar -> field1).
> * Note that no actual struct was ever declared - we are using
> * buffer as if it were the struct */
> }

 
Reply With Quote
 
 
 
 
Walter Roberson
Guest
Posts: n/a
 
      06-09-2008
In article <g2jo24$ilh$(E-Mail Removed)>,
Kenny McCormack <(E-Mail Removed)> wrote:
>Here is a commonly used technique, that will, of course, work fine on
>any reasonably modern, normal hardware. But, does it pass the CLC test?


>/* Assume well-formed input - of course, you can always break it by
> * feeding it bad input */


>struct foo { int field1, field2; char nl; } *bar;
>char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];


>int main(void) {
> bar = (struct foo *) buffer;
> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
> /* Now access the members of the struct (using, e.g., bar -> field1).
> * Note that no actual struct was ever declared - we are using
> * buffer as if it were the struct */
> }


There may be unnamed padding between struct members for any reason,
so unless the data being read from stdin via fgets was written
with exactly the same compiler version on exactly the same target,
the code is not certain to work.

Some of the compilers I use *do* put unnamed padding in places
where it is not obvious to do so, in order to achive better caching
performance.


--
"Any sufficiently advanced bug is indistinguishable from a feature."
-- Rich Kulawiec
 
Reply With Quote
 
Jens Thoms Toerring
Guest
Posts: n/a
 
      06-09-2008
Kenny McCormack <(E-Mail Removed)> wrote:
> Here is a commonly used technique, that will, of course, work fine on
> any reasonably modern, normal hardware. But, does it pass the CLC test?


> /* Assume well-formed input - of course, you can always break it by
> * feeding it bad input */


> struct foo { int field1, field2; char nl; } *bar;
> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];


> int main(void) {
> bar = (struct foo *) buffer;
> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
> /* Now access the members of the struct (using, e.g., bar -> field1).
> * Note that no actual struct was ever declared - we are using
> * buffer as if it were the struct */
> }


As long as sizeof(struct foo) isn't smaller than
SOMENUMBERWHATEVERFLOATSYOURBOAT then there's no problem.
It's rather obfuscated and I dare to doubt that this is
a "commonly used technique", but 'buffer' is memory
you own so you can do with it whatever you want. Of
course, all hinges on your primary assuption that the
input is well-formed (it may be difficult to make it
non-well-formed for the types of members the structure
has on main-stream hardware, but there might be some
systems where certain bit-patterns don't represent ints
and thus you may run into danger of undefined behaviour).
So figuring out what's well-formed can be a bit of a
bother but as long as you do that there's no problem.

Regards, Jens
--
\ Jens Thoms Toerring ___ http://www.velocityreviews.com/forums/(E-Mail Removed)
\__________________________ http://toerring.de
 
Reply With Quote
 
Hallvard B Furuseth
Guest
Posts: n/a
 
      06-09-2008
Kenny McCormack writes:
> Here is a commonly used technique, (...)


I hope not.

> struct foo { int field1, field2; char nl; } *bar;
> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
>
> int main(void) {
> bar = (struct foo *) buffer;
> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
> /* Now access the members of the struct (using, e.g., bar -> field1).


This breaks e.g. if there is a 0x10 byte (newline) in the integer
representation of the would-be bar->field1 value. And as Harald
said, it breaks if buffer is not properly aligned for a struct foo.

Also when I see fgets() I suspect the file has been opened in text
instead of binary mode, which means there may be bugs from converting
between newline and the file system's representation of end-of-line.

--
Hallvard
 
Reply With Quote
 
Chris Torek
Guest
Posts: n/a
 
      06-09-2008
>Kenny McCormack <(E-Mail Removed)> wrote:
>> Here is a commonly used technique, that will, of course, work fine on
>> any reasonably modern, normal hardware. But, does it pass the CLC test?

>
>> /* Assume well-formed input - of course, you can always break it by
>> * feeding it bad input */
>> struct foo { int field1, field2; char nl; } *bar;
>> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

>
>> int main(void) {
>> bar = (struct foo *) buffer;
>> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
>> /* Now access the members of the struct (using, e.g., bar -> field1).
>> * Note that no actual struct was ever declared - we are using
>> * buffer as if it were the struct */
>> }


In article <(E-Mail Removed)-berlin.de>,
Jens Thoms Toerring <(E-Mail Removed)> wrote:
>As long as sizeof(struct foo) isn't smaller than
>SOMENUMBERWHATEVERFLOATSYOURBOAT then there's no problem.


When I first built the 4.xBSD system for the SPARC, tftp broke,
precisely because it used this kind of trick. (In tftp's case,
it was a more complex variant of the "struct hack".)

>It's rather obfuscated and I dare to doubt that this is
>a "commonly used technique", but 'buffer' is memory
>you own so you can do with it whatever you want. Of
>course, all hinges on your primary assuption that the
>input is well-formed ...


More importantly, it depends on the variable "buffer" being
properly aligned for all member accesses.

This was not true on the SPARC, where the compiler put the
big buffer on an odd byte boundary.

As a quick fix, I wrapped the buffer up into a union, which
forced gcc to align the entire thing on an appropriate boundary.

The trick also works if you use malloc() to obtain the buffer.

In any case, it is not a very good idea to write the code this way,
because it places such strong constraints on what constitutes "well
formed" input. You need to make sure that these severe restrictions
on whatever uses the code are paid-for by whatever benefit you are
getting from this "commonly used technique" (which, in my experience,
was used perhaps once in the entire 4.xBSD code base -- that seems
to argue against the claim that it is "commonly used").
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (4039.22'N, 11150.29'W) +1 801 277 2603
email: gmail (figure it out) http://web.torek.net/torek/index.html
 
Reply With Quote
 
Jens Thoms Toerring
Guest
Posts: n/a
 
      06-09-2008
Chris Torek <(E-Mail Removed)> wrote:
> >Kenny McCormack <(E-Mail Removed)> wrote:
> >> Here is a commonly used technique, that will, of course, work fine on
> >> any reasonably modern, normal hardware. But, does it pass the CLC test?

> >
> >> /* Assume well-formed input - of course, you can always break it by
> >> * feeding it bad input */
> >> struct foo { int field1, field2; char nl; } *bar;
> >> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];

> >
> >> int main(void) {
> >> bar = (struct foo *) buffer;
> >> fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
> >> /* Now access the members of the struct (using, e.g., bar -> field1).
> >> * Note that no actual struct was ever declared - we are using
> >> * buffer as if it were the struct */
> >> }


> In article <(E-Mail Removed)-berlin.de>,
> Jens Thoms Toerring <(E-Mail Removed)> wrote:
> >As long as sizeof(struct foo) isn't smaller than
> >SOMENUMBERWHATEVERFLOATSYOURBOAT then there's no problem.


> When I first built the 4.xBSD system for the SPARC, tftp broke,
> precisely because it used this kind of trick. (In tftp's case,
> it was a more complex variant of the "struct hack".)


> >It's rather obfuscated and I dare to doubt that this is
> >a "commonly used technique", but 'buffer' is memory
> >you own so you can do with it whatever you want. Of
> >course, all hinges on your primary assuption that the
> >input is well-formed ...


> More importantly, it depends on the variable "buffer" being
> properly aligned for all member accesses.


> This was not true on the SPARC, where the compiler put the
> big buffer on an odd byte boundary.


Yes, that's a point I forgot about. Should have known better,
being bitten more than once by this issue when trying to port
(mostly other peoples code to a different architecture. I
guess I am not too good a language lawyer

Best regards, Jens
--
\ Jens Thoms Toerring ___ (E-Mail Removed)
\__________________________ http://toerring.de
 
Reply With Quote
 
rahul
Guest
Posts: n/a
 
      06-10-2008
On Jun 10, 3:30 am, Chris Torek <(E-Mail Removed)> wrote:
>
> As a quick fix, I wrapped the buffer up into a union, which
> forced gcc to align the entire thing on an appropriate boundary.


A bit off the topic:

We can also use compiler specific extensions to achieve the alignment
and padding
requirements. In case of gcc, __attribute__((packed)) for eliminating
padding for structures.
We can also use aligned attributes for buffer to coerce the alignment.
 
Reply With Quote
 
Nick Keighley
Guest
Posts: n/a
 
      06-10-2008
On 9 Jun, 18:08, (E-Mail Removed) (Kenny McCormack)
wrote:

> Here is a commonly used technique, that will, of course, work fine on
> any reasonably modern, normal hardware. *But, does it pass the CLC test?
>
> /* Assume well-formed input - of course, you can always break it by
> ** feeding it bad input */
>
> struct foo { int field1, field2; char nl; } *bar;
> char buffer[SOMENUMBERWHATEVERFLOATSYOURBOAT];
>
> int main(void) {
> * * bar = (struct foo *) buffer;
> * * fgets(buffer,SOMENUMBERWHATEVERFLOATSYOURBOAT,stdi n);
> * * /* Now access the members of the struct (using, e.g., bar -> field1).
> * * ** Note that no actual struct was ever declared - we are using
> * * ** buffer as if it were the struct */
> * * }


I used it on real systems. Now it makes me nervous.
I've seen a system break when an OS was upgraded
due to this.

To use this I'd want to be *very* sure there was an
identical system at both ends. And always would be.


--
Nick Keighley
 
Reply With Quote
 
Nick Keighley
Guest
Posts: n/a
 
      06-10-2008
On 10 Jun, 05:30, rahul <(E-Mail Removed)> wrote:
> On Jun 10, 3:30 am, Chris Torek <(E-Mail Removed)> wrote:
>
>
>
> > As a quick fix, I wrapped the buffer up into a union, which
> > forced gcc to align the entire thing on an appropriate boundary.

>
> A bit off the topic:
>
> We can also use compiler specific extensions to achieve the alignment
> and padding
> requirements. In case of gcc, __attribute__((packed)) for eliminating
> padding for structures.
> We can also use aligned attributes for buffer to coerce the alignment.


eek!!! These things are different on every compiler. And sometimes
don't exist. Some hardware cannot support it (or it becomes *very*
ineffceint).

I worked on systems that turned it on and off for
each structure in a large header...

I've hunted bugs when different packed/not packed options
had been used in different object files. It *linked* fine.

--
Nick Keighley

"Almost every species in the universe has an irrational fear of
#pragma packed. But they're wrong"
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Another "is this C++" question for the language lawyers Evan C++ 1 12-05-2006 10:57 PM
Nested Class Language Lawyers Roedy Green Java 9 08-29-2005 04:52 PM
! It's Time To STOP Feeding USA Judges & Lawyers Canobull Righteous Computer Support 8 04-16-2005 04:55 AM
iterable terminology (for language lawyers) Michele Simionato Python 4 03-16-2005 09:24 PM
OT: Lawyers even Jtyc could love. =?iso-8859-1?Q?Frisbee=AE_MCNGP?= MCSE 19 08-04-2003 04:29 AM



Advertisments