![]() |
overflow problem?
Hi all,
I'm having a bit of problem with a piece of code I've written that I think comes down to essentially an overflow, and I thought I'd check with this group :) The code is included below, and I'd just like to say before all the flames happen that i) I *know* it's extremely poor programming style, but it is intended to be that hard to read, and ii) that I don't think anyone really has to try and follow the code to answer my question. I'm still ducking though. So, the problem is this: I'm using one element of my char array, let's call it mask for clarity, as a bit mask. An ascii char value is then bitwise anded with this. Am I right in thinking that when I leftshift 10000000 for a char that it rolls over (so to speak) and I get 11111111 back out? The behaviour of the programme seems to confirm this, but I'd like an expert opinion :) CODE: #include <stdio.h> #include <stdlib.h> #include <math.h> /* Any sufficiently advanced technology is indistinguishable from magic. Arthur C. Clarke */ int bzs(char*c){ for(*++c= !(*++c = 1); (log(*c--) / log(2) < 9); *c-- += *c++ & ( *++c << 1 ), *c<<=1 ) ; return (*++c&1 << (*c- * --c))-1; } int main( int argc, char **argv ) { if( argc == 2 && (argv[0]=malloc(sizeof(char)*(strlen(argv[1]) +5))) ) { FILE*f,* g; if( sprintf( argv[0], "%s.bzs", argv[1] ), !(((f = fopen( argv[1], "r" )) == NULL) || ((g=fopen( argv[0],"w"))==NULL) )) while( fscanf(f, "%c",argv[0]) != EOF ) fprintf( g, "%d", bzs(argv[0]) ); } return 0; } Incidentally, the code compiles cleanly :) |
Re: overflow problem?
Greg <nospam@nospam.com> writes:
[...] > for(*++c= !(*++c = 1); (log(*c--) / log(2) < 9); *c-- += *c++ & > ( *++c << 1 ), *c<<=1 ) ; [...] "*c-- += *c++" has undefined behavior; it modifies an object twice between sequence points. -- Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst> "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister" |
Re: overflow problem?
On 11/08/2011 05:16 PM, Greg wrote:
> Hi all, > I'm having a bit of problem with a piece of code I've written that I > think comes down to essentially an overflow, and I thought I'd check > with this group :) The code is included below, and I'd just like to say > before all the flames happen that i) I *know* it's extremely poor > programming style, but it is intended to be that hard to read, and ii) > that I don't think anyone really has to try and follow the code to > answer my question. > > I'm still ducking though. > > So, the problem is this: I'm using one element of my char array, let's > call it mask for clarity, as a bit mask. An ascii char value is then > bitwise anded with this. Am I right in thinking that when I leftshift > 10000000 for a char that it rolls over (so to speak) and I get 11111111 > back out? The behaviour of the programme seems to confirm this, but I'd > like an expert opinion :) That program makes extensive use of plain char; plain char can be either signed or unsigned, that answers are quite different in each case. I'll assume, as you seem to be, that CHAR_BIT==8. If char is signed: If the bit pattern is 10000000, the '1' is presumably the sign bit. Which number it represents would depend upon which of the three representations allowed by C is used: sign and magnitude - negative zero two's complement: -256 one's complement: -255 On signed integers, if E1 has a non-negative value, and E1 * pow(2,E2) is representable in the result type, then that is the resulting value; Otherwise the behavior is undefined. The only representation where 10000000 is non-negative is sign and magnitude, in which case the result is exactly 0. Unsigned: The bits 10000000 represent 128. The value of E1 << E2 is E1*pow(2,E2), reduced modulo 1 more than the maximum representable value; in this case, 256. For any value of E2 greater than 0, the result is therefore 0. Therefore, the only way you could get the binary value 11111111 by shifting 10000000 by a valid shift count using a conforming implementation of C would be if the behavior was undefined. If no other aspect of the program were suspect, that would imply that on your system char is signed, and uses either a two's complement or a one's complement representation. Unfortunately, your code has undefined behavior for several other reasons, so you can't be sure which issue is the one actually responsible for that value. > #include <stdio.h> > #include <stdlib.h> > #include <math.h> > > /* Any sufficiently advanced technology is indistinguishable from magic. > > Arthur C. Clarke */ > > int bzs(char*c){ > for(*++c= !(*++c = 1); (log(*c--) / log(2) < 9); *c-- += *c++ & > ( *++c << 1 ), *c<<=1 ) ; The first expression in the for statement changes the value of c twice without an intervening sequence point. The third expression does so three times. Those both have undefined behavior. If *c is negative, then log(*c--) has undefined behavior. Since the value of *c depends upon the input file, if char is signed, this program does nothing to avoid that possibility. > return (*++c&1 << (*c- * --c))-1; That expression updates the value of c twice without an intervening sequence point. More undefined behavior. > } > > int main( int argc, char **argv ) { > if( argc == 2 && (argv[0]=malloc(sizeof(char)*(strlen(argv[1]) > +5))) ) { Since your code calls strlen() without a declaration of that function in scope, if it compiled without diagnostics you must be using C90, where it will be implicitly declared as returning an 'int'. Since it actually returns a size_t, that alone is sufficient to cause problems, particularly if sizeof(size_t) is different from sizeof(int). > FILE*f,* g; > if( sprintf( argv[0], "%s.bzs", argv[1] ), !(((f = > fopen( argv[1], "r" )) == NULL) || ((g=fopen( argv[0],"w"))==NULL) )) > while( fscanf(f, "%c",argv[0]) != EOF ) fprintf( g, "%d", bzs(argv[0]) > ); > } > return 0; > > } I think I understand what main() is doing; at least I understand it better than I understand what bzs() was intended to do; but I wouldn't dare declare main() free of defects. |
Re: overflow problem?
On Tue, 08 Nov 2011 18:26:46 -0500, James Kuyper wrote:
> I'll assume, as you seem to be, that CHAR_BIT==8. Actually, you appear to be assuming that CHAR_BIT==9 ;) > If char is signed: > If the bit pattern is 10000000, the '1' is presumably the sign bit. > Which number it represents would depend upon which of the three > representations allowed by C is used: > sign and magnitude - negative zero > two's complement: -256 > one's complement: -255 The last two should be -128 and -127 respectively for CHAR_BIT==8. |
Re: overflow problem?
On 11/09/2011 04:05 AM, Nobody wrote:
> On Tue, 08 Nov 2011 18:26:46 -0500, James Kuyper wrote: > >> I'll assume, as you seem to be, that CHAR_BIT==8. > > Actually, you appear to be assuming that CHAR_BIT==9 ;) > >> If char is signed: >> If the bit pattern is 10000000, the '1' is presumably the sign bit. >> Which number it represents would depend upon which of the three >> representations allowed by C is used: >> sign and magnitude - negative zero >> two's complement: -256 >> one's complement: -255 > > The last two should be -128 and -127 respectively for CHAR_BIT==8. It seems like almost every post I've made for the past several days has had at least one silly stupid mistake. I guess I need a vacation. Someone has been saving up a list of several dozen "undocumented data outages" for several years, and it suddenly occurred to them that maybe they should report all of these outages to someone who can actually investigate why they happened, in time for the start of Collection 6 reprocessing. It's dull boring work, with no one I can delegate any of it to. The boredom is relieved only by the terror of worrying about whether one of the outages might turn out to be due to a defect in one of my programs which will need a last-minute fix. -- James Kuyper |
Re: overflow problem?
James Kuyper writes:
> On 11/09/2011 04:05 AM, Nobody wrote: >> On Tue, 08 Nov 2011 18:26:46 -0500, James Kuyper wrote: >> >>> I'll assume, as you seem to be, that CHAR_BIT==8. >> >> Actually, you appear to be assuming that CHAR_BIT==9 ;) >> >>> If char is signed: >>> If the bit pattern is 10000000, the '1' is presumably the sign bit. >>> Which number it represents would depend upon which of the three >>> representations allowed by C is used: >>> sign and magnitude - negative zero >>> two's complement: -256 >>> one's complement: -255 >> >> The last two should be -128 and -127 respectively for CHAR_BIT==8. > > It seems like almost every post I've made for the past several days has > had at least one silly stupid mistake. I guess I need a vacation. > > Someone has been saving up a list of several dozen "undocumented data > outages" for several years, and it suddenly occurred to them that maybe > they should report all of these outages to someone who can actually > investigate why they happened, in time for the start of Collection 6 > reprocessing. It's dull boring work, with no one I can delegate any of > it to. The boredom is relieved only by the terror of worrying about > whether one of the outages might turn out to be due to a defect in one > of my programs which will need a last-minute fix. Thanks to all who responded; that clears up what was going on. As far as it goes, I know I'm using undefined behaviour and taking advantage of the way my particular compiler and operating system work together, and I know this is severely frowned upon: that's why I didn't ask anyone to try and follow the code through. Nonetheless, had I not known it, I'd have appreciated having it pointed out to me :) |
Re: overflow problem?
On Nov 9, 8:01*pm, Greg <nos...@nospam.com> wrote:
<snip> > Thanks to all who responded; that clears up what was going on. *As far as > it goes, I know I'm using undefined behaviour and taking advantage of the > way my particular compiler and operating system work together, and I know > this is severely frowned upon: that's why I didn't ask anyone to try and > follow the code through. *Nonetheless, had I not known it, I'd have > appreciated having it pointed out to me :) in real world programs some exploitation of undefined behaviour is probably inevitable. But relying on particular behaviour for *++c= !(*+ +c = 1) just seems pure madness! If you change the compiler the behaviour may change. If you upgrade the compiler the behaviour may change. If you change optimisation settings the behaviour may change. A small change to your program text may radically change the behaviour. The benefit/cost ratio (what benefit?) just seems too low to me. |
| All times are GMT. The time now is 09:20 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.