![]() |
pointer arithmetic question.
Hi All,
The following program is crashing. #include<stdio.h> #include<ctype.h> int main(void) { char s[] ="test"; char *t=s; while(*t) { *t = toupper(*t++); //t++; } printf("\n%s\n",s); printf("\n"); return 0; } But according to me it may not have undefined behaviour. So let me describe my understanding line by line. char s[] ="test"; This line define a modifiable character array and initialize it to "test"; char *t=s; s is a pointer the first element of the character array "test" and has type as char *. That value is assigned to t. Which is legal. while(*t) If the value of *t is '\0' , the expression while (*t) will be false and the loop will end. *t = toupper(*t++); The = operator has associativity as right to left so it is evaluates from right to left so toupper(*t++); will be evaluated first the the value will be assigned to *t. ++ has higher precedence than * operator so t++ will be evaluated then the value of *t++ will be passed to the function toupper and as a side effect of t++ , t will be pointing to the next character in the s array. the return value of toupper will be place in the *t. This will continue until *t become '\0'. If I do not increment 't' in this expression *t = toupper(*t); but do it in next line it work as expected. Where my understanding going wrong? I think the crash may be due to the following reason. ==================== In the expression *t = toupper(*t++); *t (the left hand side of = ) may have the updated value that is the result of t++. So if t++ point to '\0' and dereference that can cause undefined behaviour. But how that can happen also ? As post increment operator's side effect will be taking effect after the sequence point in this case after execution of the code *t = toupper(*t++); I am not sure. I find the C standard very cryptic to get answer. Is there any book which describe the standard with example also can be understood by non experts? Is there any easy way to know different causes of undefined behaviour in an expression? Regards, Somenath |
Re: pointer arithmetic question.
somenath wrote:
> > Hi All, > > The following program is crashing. > > #include<stdio.h> > #include<ctype.h> > int main(void) > { > char s[] ="test"; > char *t=s; > while(*t) > { > *t = toupper(*t++); > //t++; > } > printf("\n%s\n",s); > printf("\n"); > return 0; > } > > But according to me it may not have undefined behaviour. So let me > describe my understanding line by line. > > char s[] ="test"; > This line define a modifiable character array and initialize it to > "test"; > > char *t=s; > > s is a pointer the first element of the character array "test" and has > type as char *. That value is assigned to t. Which is legal. > > while(*t) > If the value of *t is '\0' , the expression while (*t) will be false > and the loop will end. > > *t = toupper(*t++); > The = operator has associativity as right to left so it is evaluates > from right to left so toupper(*t++); will be evaluated first the the > value will be assigned to *t. Associativity doesn't constrain evaluation order. The order of evaluation of the operands of = can occur in either order, just like the order of evaluation of the operands of most other binary operators. (Of course, the LHS of = is evaluated as an lvalue not as an rvalue, meaning it is evaluated to determine the destination of the assignment, rather than to determine a value.) > ++ has higher precedence than * > operator so t++ will be evaluated then the value of *t++ will be > passed to the function toupper and as a side effect of t++ , t will > be pointing to the next character in the s array. > the return value of toupper will be place in the *t. There is a sequence point after evaluation of the arguments to toupper and before the call. That means that the increment of t has to be completed before the call to toupper. However, since the evaluation of the LHS of the = may be performed either before or after evaluation of the RHS, the value of t used to store the result of the assignment may be either the old or the new value of t. In your case, based on the behavior you have observed, it appears to be using the new value of t. (Of course, since the behavior is undefined, it could be doing anything.) > I think the crash may be due to the following reason. > ==================== > In the expression *t = toupper(*t++); *t (the left hand side of = ) > may have the updated value that is the result of t++. Yes, this is the cause of the crash. > So if t++ point to '\0' and dereference that can cause undefined > behaviour. No, dereferencing a pointer that points to '\0' is fine; it is done all the time. But, the problem is that you overwrite that '\0', so the loop doesn't terminate. > But how that can happen also ? As post increment operator's side > effect will be taking effect after the sequence point in this case > after execution of the code *t = toupper(*t++); Side effects are guaranteed to take place *after *the *preceding* sequence point and *before* the *following* sequence point. You cannot expect a side effect to be delayed until the following sequence point. Also, in this case, the sequence point following t++ is the one before the call to toupper, not the one at the end of the full expression. > I find the C standard very cryptic to get answer. Is there any easy > way to know different causes of undefined behaviour in an expression? The way for non-experts (and experts, too) to avoid undefined behaviour in an expression is to not apply side effects to any components of an expression that also appear elsewhere in the expression--except that it is okay to use the LHS of an assignment expression in the RHS. This rule may be overly restrictive, but it is easily understood and makes code easy to read and easy to write. |
Re: pointer arithmetic question.
On 2012-05-06, somenath <somenathpal@gmail.com> wrote:
> *t = toupper(*t++); Since t is modified in this expression, its value may not simultaneously be used for purpose other than determining the new value being stored into t. Here, t is also used to compute the storage location where to put the result of toupper. This use is not divided from the t++ modification by a sequence point. > *t = toupper(*t++); > The = operator has associativity *GONG* Associativity is how you parse the symbols to make a parse tree. Evaluation is tree walking. A tree can be walked in many orders. In C, the tree of an expression, so to speak, may be evaluated in any possible order, subject to only sequence points, which are basically assertions that one subtree must be done before another. Once you use "associativity" or "predecence" in describing what you think the evaluation order should be, you've basically lost. > from right to left so toupper(*t++); will be evaluated first the the No, we have this tree = / \ * call() / / \ t toupper args / * / ++(post) / t The only sequence point in this expression tree is before the function call. That is, once toupper is evaluated to a function pointer, and once the args are evaluated, and the function is ready to be called, a sequence point takes place, and then the function is invoked. Furthermore, to complicate things, the completion of side effects can take place out of order with respect to the tree. Side effects can be gathered into a "queue" and then "flushed" at the next sequence point. However, this is played out in the right subtree of the main = node. It has no bearing on when the left side of the = is evaluated. (Of course, when the function is being called, the evaluation of this tree is suspended!) For instance, here is a possible order: = 9 / \ *8 call() 7 / / \ t toupper args 1 4 / * 5 / ++(post) 3,6 / t 2 I put 3,6 next to the ++ because it has two events: computation of its value in the expression tree (the value yielded by ++, which is the prior value of t), and the event of updating t to the new value: the completion of the side effect. In my order above, I gave it #6: the increment happens just before 7, the call to the function (which is preceded by a sequence point). This update cannot be delayed past the function call. You also have to keep in mind that parallel orders are possible, and there is optimization. Both 1 and 2 just access t so they can be merged. Note how in my order, the dereference on the left happens at point 8, after the function call. But the value of t accessed for that purpose happens at point 1. So it's the old value of t. Here is a different possible order: = 9 / \ *8 call() 7 / / \ t toupper args 5 1 / * 6 / ++(post) 3,4 / t 2 Note this sneaky evaluation order. The t++ is evaluated early (steps 2, 3, 4), and to completion (4 means the side effect completes and t is updated). The very next step is 5, which is the evaluation of t in the left hand side. It now fetches the new value! Then evaluation goes back to the right side and completes the function call. The result of the call is assigned to the new location pointed at by the new t, not the old location. In all orders, the = will be 9, because = is the root, and so it is visited last. The evaluation of the tree cannot be just be any order whatsoever. It has to be a bottom-up traversal! But many bottom-up traversals are possible. Precedence and associativity are related to evaluation order like this. They establish what is bottom and what is up. For instance, a + b * c gives us this tree: + / \ a * / \ b c In all possible traversals, the * node is visited before the + node. However, the a, b and c nodes can be visited in any six possible orders. They are all bottom nodes (leaf nodes) and we can pick any bottom node to be evaluated first. The constraints are: * cannot be evaluated before b and c. (You can't multiply until you have the values of the multiplicands!) And + cannot happen until the * is done, and the value of b is known (you can't add until you have the two terms.) Yet, six serial evaluation orders are possible, plus parallel evaluation. -- If you ever need any coding done, I'm your goto man! |
Re: pointer arithmetic question.
On May 6, 7:25*am, Philip Lantz <p...@canterey.us> wrote:
> Associativity doesn't constrain evaluation order. Here's an example to illustrate the point: x = a && (function1(b) + function2(b)); As far as associativity goes, the brackets mean that it is the result of the addition that is acted on by the &&. In mathematical terms you might say that the addition is "done first". But as far as execution goes, it is very different. One of the rules of && is that, if the first operand is zero (meaning that the result must be zero), the second operand is not even evaluated. So here, the computer will first check whether a is zero, and if it is then the two fuctions will not be called. So it is doing the && (or part of it) before it does the addition. |
Re: pointer arithmetic question.
pete <pfiland@mindspring.com> writes:
> pete wrote: >> >> somenath wrote: >> > >> > Hi All, >> > >> > The following program is crashing. >> >> > char s[] ="test"; >> > char *t=s; >> >> > *t = toupper(*t++); >> > //t++; > >> > But how that can happen also ? As post increment operator's side >> > effect will be taking effect after the sequence point in this case >> > after execution of the code *t = toupper(*t++); > >> Also, >> for the above definition of (t), >> the opcode for this expression: (t++) >> may or may not be >> the same as the opcode for this expression: (++t, t-1). >> Whether the side effect takes place before >> or after the value of (t++)is determined, >> is up to the implementation. > > That's the old way. > In the new standard, > there is a sequence point > in the value of a postfix increment expression. > > n1570 > 6.5.2.4 Postfix increment and decrement operators > 2 The value computation of the result is sequenced > before the side effect of updating the stored value of the operand. Not a sequence point but a sequenced-before relationship. A sequenced-before relationship is less restrictive than a sequence point. (The term 'sequence point' is a shorthand meaning all the value computations and side-effects of expressions before the sequence point are sequenced before all the value computations and side-effects of expressions after the sequence point. The exact definition is given in 5.1.2.3 p3.) Note that the relationship described here is only between producing the result and updating the stored value. In the assignment *t = toupper( *t++ ); there is still no sequencing relationship between the subexpressions '*t' and '*t++'. Because neither of these subexpressions is required to be sequenced before the other, this runs afoul of the condition described in 6.5 p2, and hence is undefined behavior. |
Re: pointer arithmetic question.
On 2012-05-08, Tim Rentsch <txr@alumni.caltech.edu> wrote:
> pete <pfiland@mindspring.com> writes: > >> pete wrote: >>> >>> somenath wrote: >>> > >>> > Hi All, >>> > >>> > The following program is crashing. >>> >>> > char s[] ="test"; >>> > char *t=s; >>> >>> > *t = toupper(*t++); >>> > //t++; >> >>> > But how that can happen also ? As post increment operator's side >>> > effect will be taking effect after the sequence point in this case >>> > after execution of the code *t = toupper(*t++); >> >>> Also, >>> for the above definition of (t), >>> the opcode for this expression: (t++) >>> may or may not be >>> the same as the opcode for this expression: (++t, t-1). >>> Whether the side effect takes place before >>> or after the value of (t++)is determined, >>> is up to the implementation. >> >> That's the old way. >> In the new standard, >> there is a sequence point >> in the value of a postfix increment expression. >> >> n1570 >> 6.5.2.4 Postfix increment and decrement operators >> 2 The value computation of the result is sequenced >> before the side effect of updating the stored value of the operand. > > Not a sequence point but a sequenced-before relationship. A > sequenced-before relationship is less restrictive than a sequence > point. No it isn't. It is exactly the same thing. > (The term 'sequence point' is a shorthand meaning all the > value computations and side-effects of expressions before the > sequence point are sequenced before all the value computations The expressions before a sequence point are only those which are listed as being before that particular sequence point. > and side-effects of expressions after the sequence point. The > exact definition is given in 5.1.2.3 p3.) This is restricted to the subexpression in which it is happening, and so it doesn't cover all expressions. So even though the comma operator has a sequence point, A is not sequenced before D: (A, B) + (C, D) A sequence point amounts to the same thing as "sequenced before". To say that "A is evaluated, then a sequence point takes place, and then B" is logically equivalent to "A is sequenced before B". > Note that the relationship described here is only between > producing the result and updating the stored value. In the > assignment > > *t = toupper( *t++ ); > > there is still no sequencing relationship between the > subexpressions '*t' and '*t++'. This would still be true even if the wording was that there is a sequence point in the ++ operator. |
Re: pointer arithmetic question.
Kaz Kylheku <kaz@kylheku.com> writes:
> On 2012-05-08, Tim Rentsch <txr@alumni.caltech.edu> wrote: >> pete <pfiland@mindspring.com> writes: >> >>> pete wrote: >>>> >>>> somenath wrote: >>>> > >>>> > Hi All, >>>> > >>>> > The following program is crashing. >>>> >>>> > char s[] ="test"; >>>> > char *t=s; >>>> >>>> > *t = toupper(*t++); >>>> > //t++; >>> >>>> > But how that can happen also ? As post increment operator's side >>>> > effect will be taking effect after the sequence point in this case >>>> > after execution of the code *t = toupper(*t++); >>> >>>> Also, >>>> for the above definition of (t), >>>> the opcode for this expression: (t++) >>>> may or may not be >>>> the same as the opcode for this expression: (++t, t-1). >>>> Whether the side effect takes place before >>>> or after the value of (t++)is determined, >>>> is up to the implementation. >>> >>> That's the old way. >>> In the new standard, >>> there is a sequence point >>> in the value of a postfix increment expression. >>> >>> n1570 >>> 6.5.2.4 Postfix increment and decrement operators >>> 2 The value computation of the result is sequenced >>> before the side effect of updating the stored value of the operand. >> >> Not a sequence point but a sequenced-before relationship. A >> sequenced-before relationship is less restrictive than a sequence >> point. > > No it isn't. It is exactly the same thing. No, they are different. An example will illustrate. The semantics for assignment includes a sequenced-before relationship. This relationship allows expressions like i = a[i] = i+1; to have well-defined behavior, rather than being undefined behavior. Under the existing semantics, the two side-effects of this expression (ie, the updating of 'i' and 'a[i]') can occur in any order. If the sequenced-before relationship were instead a sequence point, then the side-effects of the operands would have to be completed before the store into 'i' can proceed. That is, the store into 'a[i]' must be done before the store into 'i' starts. That additional restriction doesn't hold under the current semantics, which specifies only a sequenced-before relationship. The difference is evident if we consider an expression like a[i] = a[j] = 7; If the semantics for assignment specified a sequence point, then this expression would have well-defined behavior even when i == j. As it is, under the current semantics which specifies only a sequenced-before relationship, when i == j this expression has undefined behavior, because there are two modifications to the same object with no sequencing relationship between them. > [snip remainder] P.S. Sorry about coming through aioe.org for this posting; temporary while eternel-september.org is offline or I can can find another newsgroups hosting site. |
Re: pointer arithmetic question.
On 2012-05-08, Tim Rentsch <txr@alumni.caltech.edu> wrote:
> Kaz Kylheku <kaz@kylheku.com> writes: > >> On 2012-05-08, Tim Rentsch <txr@alumni.caltech.edu> wrote: >>> pete <pfiland@mindspring.com> writes: >>> >>>> pete wrote: >>>>> >>>>> somenath wrote: >>>>> > >>>>> > Hi All, >>>>> > >>>>> > The following program is crashing. >>>>> >>>>> > char s[] ="test"; >>>>> > char *t=s; >>>>> >>>>> > *t = toupper(*t++); >>>>> > //t++; >>>> >>>>> > But how that can happen also ? As post increment operator's side >>>>> > effect will be taking effect after the sequence point in this case >>>>> > after execution of the code *t = toupper(*t++); >>>> >>>>> Also, >>>>> for the above definition of (t), >>>>> the opcode for this expression: (t++) >>>>> may or may not be >>>>> the same as the opcode for this expression: (++t, t-1). >>>>> Whether the side effect takes place before >>>>> or after the value of (t++)is determined, >>>>> is up to the implementation. >>>> >>>> That's the old way. >>>> In the new standard, >>>> there is a sequence point >>>> in the value of a postfix increment expression. >>>> >>>> n1570 >>>> 6.5.2.4 Postfix increment and decrement operators >>>> 2 The value computation of the result is sequenced >>>> before the side effect of updating the stored value of the operand. >>> >>> Not a sequence point but a sequenced-before relationship. A >>> sequenced-before relationship is less restrictive than a sequence >>> point. >> >> No it isn't. It is exactly the same thing. > > No, they are different. An example will illustrate. > > The semantics for assignment includes a sequenced-before > relationship. This relationship allows expressions like > > i = a[i] = i+1; > > to have well-defined behavior, rather than being undefined > behavior. I'm a fundamentalist believer in the literal interpretation of the value of an assignment expression being that of the left operand after the assignment. I.e. to me it means one of two things. 1. The left operand has to be identified during the evaluation of the assignment expression. The side effect of updating that operand can happen later, but once the effective address of the operand is established, it does not change That is to say, evaluation of the assignment expression and all of its constitutents is complete before that expression yields a value, except possibly for delayed side effects. There should not be a (re-)evaluation of the raw expression a[i] at side effect time. A violation of this principle means that an expression's value is used, even though the expression has not been completely evaluated, or else that an expression which should be evaluated just once is being evaluated twice. The semantic description of assignment does not suggest that any part of the evaluation of the assignment may be delayed to the next sequence point, only the effect of updating the operand. Updating an operand is not the same thing as calculating an operand's effective address and then updating it. The object known as the operand is not known until the expression which designates it is calculated. There is no operand until then. 2. Or else "after the assignment" literally means after the complete assignment (side effect and all). This means that the expression's value is not available until the side effect completes. |
Re: pointer arithmetic question.
Kaz Kylheku <kaz@kylheku.com> writes:
> On 2012-05-08, Tim Rentsch <txr@alumni.caltech.edu> wrote: >> Kaz Kylheku <kaz@kylheku.com> writes: >> >>> On 2012-05-08, Tim Rentsch <txr@alumni.caltech.edu> wrote: >>>> pete <pfiland@mindspring.com> writes: >>>> >>>>> pete wrote: >>>>>> >>>>>> somenath wrote: >>>>>> > >>>>>> > Hi All, >>>>>> > >>>>>> > The following program is crashing. >>>>>> >>>>>> > char s[] ="test"; >>>>>> > char *t=s; >>>>>> >>>>>> > *t = toupper(*t++); >>>>>> > //t++; >>>>> >>>>>> > But how that can happen also ? As post increment operator's side >>>>>> > effect will be taking effect after the sequence point in this case >>>>>> > after execution of the code *t = toupper(*t++); >>>>> >>>>>> Also, >>>>>> for the above definition of (t), >>>>>> the opcode for this expression: (t++) >>>>>> may or may not be >>>>>> the same as the opcode for this expression: (++t, t-1). >>>>>> Whether the side effect takes place before >>>>>> or after the value of (t++)is determined, >>>>>> is up to the implementation. >>>>> >>>>> That's the old way. >>>>> In the new standard, >>>>> there is a sequence point >>>>> in the value of a postfix increment expression. >>>>> >>>>> n1570 >>>>> 6.5.2.4 Postfix increment and decrement operators >>>>> 2 The value computation of the result is sequenced >>>>> before the side effect of updating the stored value of the operand. >>>> >>>> Not a sequence point but a sequenced-before relationship. A >>>> sequenced-before relationship is less restrictive than a sequence >>>> point. >>> >>> No it isn't. It is exactly the same thing. >> >> No, they are different. An example will illustrate. >> >> The semantics for assignment includes a sequenced-before >> relationship. This relationship allows expressions like >> >> i = a[i] = i+1; >> >> to have well-defined behavior, rather than being undefined >> behavior. > > I'm a fundamentalist believer in the literal interpretation of > the value of an assignment expression being that of the left > operand after the assignment. I.e. to me it means one of two things. > > 1. The left operand has to be identified during the evaluation > of the assignment expression. The side effect of updating > that operand can happen later, but once the effective > address of the operand is established, it does not change > That is to say, evaluation of the assignment expression and all of its > constitutents is complete before that expression yields a value, except > possibly for delayed side effects. There should not be a > (re-)evaluation of the raw expression a[i] at side effect time. > > A violation of this principle means that an expression's > value is used, even though the expression has not been completely > evaluated, or else that an expression which should be evaluated > just once is being evaluated twice. > > The semantic description of assignment does not suggest that > any part of the evaluation of the assignment may be delayed > to the next sequence point, only the effect of updating the > operand. Updating an operand is not the same thing as > calculating an operand's effective address and then updating it. > The object known as the operand is not known until the expression which > designates it is calculated. There is no operand until then. The Standard guarantees this. The side-effect of updating the stored value of the left operand of an assignment is sequenced after the value computations of the left and right operands. Also, more generally, the value computations of the operands of an operator (not just assignment) are sequenced before the value computation of the result of the operator. So the update of an assignment operator isn't started until all of its sub-expressions' values have been completely computed. > 2. Or else "after the assignment" literally means after the > complete assignment (side effect and all). This means that the > expression's value is not available until the side effect > completes. The Standard does not guarantee this. That's why assignments like a[i] = a[j] = 7; have undefined behavior when i == j; |
Re: pointer arithmetic question.
On 2012-05-10, Tim Rentsch <txr@alumni.caltech.edu> wrote:
> Kaz Kylheku <kaz@kylheku.com> writes: >> 2. Or else "after the assignment" literally means after the >> complete assignment (side effect and all). This means that the >> expression's value is not available until the side effect >> completes. > > The Standard does not guarantee this. That's why assignments like > > a[i] = a[j] = 7; > > have undefined behavior when i == j; Well, the sequencing (i.e sequence point: same thing, as I contend) is between the evaluation of a[j] and 7, and the assignment. It is not "intervening" between the update of a[j] and a[i]. This doesn't show "sequenced before" is a different concept from "sequence point". Exactly like "sequenced before", sequenced points can be localized within a subexpression, so that they do not intervene between unrelated evaluations in the surrounding full expression. This is well-known via the example (A, B) + (C, D) where neither sequence point intervenes between A and D. (We don't even know whether A is evaluated first or D, or whether they are interleaved.) A sequence point doesn't mean that all effects are settled and no new ones begin; just those in the scope of the operation to which the particular sequence point immediately belongs, which can be "comma operator" or "full expression", etc. |
| All times are GMT. The time now is 03:10 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.