Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   Writing a C Compiler: lvalues (http://www.velocityreviews.com/forums/t722602-writing-a-c-compiler-lvalues.html)

André Wagner 05-08-2010 01:34 PM

Writing a C Compiler: lvalues
 
Hello,

I'm writing a C compiler. It's almost over, except that is not
handling lvalues correctly.

Let me show a example. The code "x = 5" (let's say 'x' was declared
before) yields this in pseudo-assembly:

mov $b, $fp+8 ; $fp+8 is 'x' addess, so I'm storing x's address in
$b
mov $a, 5
mov [$b], $a ; here I'm putting what's in $a in the address
pointed to $b

Since 'x' is a lvalue in this case, I don't need its value, just the
address of the variable.

Now, if I want to access 'x' in the middle of a non-lvalue expressing,
I would do:

mov $a, $fp+8
mov $a, [$a]

Notice how I get the varible addres, and from it, the value.

What I'm trying to say is: the compiler yields different assembly code
for when 'x' is a lvalue and when 'x' is not a lvalue.

This gets more confusing when I have expressions such as 'x++'. This
is simple, since 'x' is obviously a lvalue in this case. In the case
of the compiler, I can parse 'x' and see that the lookahead points to
'++', so it's a lvalue.

But what about '(x)++'? In this case, the compiler evaluates the
subexpression '(x)', and this expression results the value of 'x', not
the address. Now I have a '++' ahead, so how can I know the address of
'x' since all that I have is a value?

All documentation that I found about lvalues were too vague, and
directed to the programmer, and not to the compiler writer. Are there
any specific rules for determining if the result of a expression is a
lvalue?

Thanks in advance,

Andri
[I believe the usual approach is to translate the expression into an
AST before doing much else, which has the useful effect of making the
parentheses go away. As you've found, in C you have to treat (x) and x
the same. It's not Fortran. -John]


Ben Bacarisse 05-09-2010 04:56 PM

Re: Writing a C Compiler: lvalues
 
AndrC) Wagner <andre.nho@gmail.com> writes:

> I'm writing a C compiler. It's almost over, except that is not
> handling lvalues correctly.

<snip>
> What I'm trying to say is: the compiler yields different assembly code
> for when 'x' is a lvalue and when 'x' is not a lvalue.


Yes, that's normal -- at least as the level of the abstract machine
which seems to be roughly what yo pseudo-assembler is.

> This gets more confusing when I have expressions such as 'x++'. This
> is simple, since 'x' is obviously a lvalue in this case. In the case
> of the compiler, I can parse 'x' and see that the lookahead points to
> '++', so it's a lvalue.
>
> But what about '(x)++'? In this case, the compiler evaluates the
> subexpression '(x)', and this expression results the value of 'x', not
> the address. Now I have a '++' ahead, so how can I know the address of
> 'x' since all that I have is a value?
>
> All documentation that I found about lvalues were too vague, and
> directed to the programmer, and not to the compiler writer. Are there
> any specific rules for determining if the result of a expression is a
> lvalue?


The C standard (draft PDF available here[1]) tells you which expression
forms denote lvalues and which don't. As you traverse the parse tree,
the "lead operator" of the tree will tell you whether you need l- or
r-value evaluation. The result will be rather naive code, but it is a
start.

[1] http://www.open-std.org/JTC1/SC22/WG...docs/n1256.pdf
<snip>
--
Ben.


Tom St Denis 05-09-2010 05:25 PM

Re: Writing a C Compiler: lvalues
 
On May 8, 9:34 am, Andri Wagner <andre....@gmail.com> wrote:

> What I'm trying to say is: the compiler yields different assembly code
> for when 'x' is a lvalue and when 'x' is not a lvalue.
>
> This gets more confusing when I have expressions such as 'x++'. This
> is simple, since 'x' is obviously a lvalue in this case. In the case
> of the compiler, I can parse 'x' and see that the lookahead points to
> '++', so it's a lvalue.
>
> But what about '(x)++'? In this case, the compiler evaluates the
> subexpression '(x)', and this expression results the value of 'x', not
> the address. Now I have a '++' ahead, so how can I know the address of
> 'x' since all that I have is a value?


++ requires an object that an address can be taken of attached to
either the right or left which forms part of a larger expression.

so it's really

(object)++

could be, for instance

(*(ptr + a))++

For all it matters.

I guess it depends on how you wrote your parser, but basically when
you encounter ++ it must either be before or after an expression whose
address is computable.

> All documentation that I found about lvalues were too vague, and
> directed to the programmer, and not to the compiler writer. Are there
> any specific rules for determining if the result of a expression is a
> lvalue?


Read the BNF grammar for C. The full BNF form is in appendix A32 of K&R
C 2nd edition. Page 238 describes how to look at both post and prefix
expressions.

BTW I don't claim to be a compiler theory expert so that's about all
the help you're gonna get from me :-)

Tom

Keith Thompson 05-09-2010 08:01 PM

Re: Writing a C Compiler: lvalues
 
AndrC) Wagner <andre.nho@gmail.com> writes:
> ...
> What I'm trying to say is: the compiler yields different assembly code
> for when 'x' is a lvalue and when 'x' is not a lvalue.


Of course.

> This gets more confusing when I have expressions such as 'x++'. This
> is simple, since 'x' is obviously a lvalue in this case. In the case
> of the compiler, I can parse 'x' and see that the lookahead points to
> '++', so it's a lvalue.
>
> But what about '(x)++'? In this case, the compiler evaluates the
> subexpression '(x)', and this expression results the value of 'x', not
> the address. Now I have a '++' ahead, so how can I know the address of
> 'x' since all that I have is a value?


In C, a parenthesized lvalue is an lvalue.

> All documentation that I found about lvalues were too vague, and
> directed to the programmer, and not to the compiler writer. Are there
> any specific rules for determining if the result of a expression is a
> lvalue?


The definitive document is the C standard. You can get a copy of the
1999 ISO C standard by sending money to your national standard body;
see, for example, webstore.ansi.org. Or you can get a free copy of
the latest post-C99 draft, incorporating the C99 standard plus the
three Technical Corrigenda, at
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf>. The
Technical Corrigenda themselves are available at no charge.

I wouldn't even consider trying to implement a compiler without
having a copy of the standard for the language.

Some relevant passages:

C99 6.5.2.1:

Except when it is the operand of the sizeof operator, the unary &
operator, the ++ operator, the -- operator, or the left operand
of the . operator or an assignment operator, an lvalue that
does not have array type is converted to the value stored in
the designated object (and is no longer an lvalue).

C99 6.5.1p5:

A parenthesized expression is a primary expression. Its
type and value are identical to those of the unparenthesized
expression. It is an lvalue, a function designator, or a void
expression if the unparenthesized expression is, respectively,
an lvalue, a function designator, or a void expression.

Note that the definition of "lvalue" in C99 6.3.2.1p1 is flawed, or
at least incomplete. An lvalue is not merely "an expression with
an object type or an incomplete type other than void"; it's such
an expression that designates, or that could designate, an object.
For example, int is an object type, and 42 is an expression of
type int, but 42 is not an lvalue. On the other hand, if ptr is a
pointer-to-int, *ptr is an lvalue, even if ptr==NULL (but attempting
to use it invokes undefined behavior).

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>
Nokia

Eric Sosman 05-09-2010 09:09 PM

Re: Writing a C Compiler: lvalues
 
On 5/9/2010 1:25 PM, Tom St Denis wrote:
>
> ++ requires an object that an address can be taken of attached to
> either the right or left which forms part of a larger expression.


Yes to "object," no to "address can be taken." Examples:

register int obj1 = 42;
struct { int obj2 : 7; } s = { 42 };
++obj1; // okay
s.obj2++; // okay
&obj1; // constraint violation
&s.obj2; // constraint violation

--
Eric Sosman

Stargazer 05-10-2010 07:44 AM

Re: Writing a C Compiler: lvalues
 
On May 8, 4:34 pm, Andri Wagner <andre....@gmail.com> wrote:
> Hello,
>
> I'm writing a C compiler. It's almost over, except that is not
> handling lvalues correctly.


It's not "almost over" then :-)

> Let me show a example. The code "x = 5" (let's say 'x' was declared
> before) yields this in pseudo-assembly:
>
> mov $b, $fp+8 ; $fp+8 is 'x' addess, so I'm storing x's address in
> $b
> mov $a, 5
> mov [$b], $a ; here I'm putting what's in $a in the address
> pointed to $b
>
> Since 'x' is a lvalue in this case, I don't need its value, just the
> address of the variable.
>
> Now, if I want to access 'x' in the middle of a non-lvalue expressing,
> I would do:
>
> mov $a, $fp+8
> mov $a, [$a]


It looks as real x86 assembly and looks like you're jumping into
assembly generation too early.

> Notice how I get the varible addres, and from it, the value.
>
> What I'm trying to say is: the compiler yields different assembly code
> for when 'x' is a lvalue and when 'x' is not a lvalue.
>
> This gets more confusing when I have expressions such as 'x++'. This
> is simple, since 'x' is obviously a lvalue in this case. In the case
> of the compiler, I can parse 'x' and see that the lookahead points to
> '++', so it's a lvalue.


No, you can't assume that programmer always writes correct code. A
programmer may mistake, as in Eric's example, or he can write junk as

if (heaven)
666--;

and compiler must be able to determine that an assignment to a non-
lvalue takes place.

> But what about '(x)++'? In this case, the compiler evaluates the
> subexpression '(x)', and this expression results the value of 'x', not
> the address. Now I have a '++' ahead, so how can I know the address of
> 'x' since all that I have is a value?


When I attempted at writing a C compiler (I wrote parser by hand), I
defined a "simpler C" pseudo-code - a subset of C, which allowed only
assignments in form "__temp_NN = &var;", "__temp_NN = *__temp_MM;",
"*__temp_NN = __temp_MM;", "__temp_NN = ~__temp_MM;" (instead of "~"
there could be "!" or "-") and "__temp_NN = var1 + var2;" (instead of
"+" there could be any arithmetic or logic binary operator). Also
allowed were conditional branches in form of "if (__temp_NN != 0) goto
xxx;" and unconditional branches ("goto xxx;"). "__temp_NN" were
temporary variables of suitable type for machine registers and if out
of registers they were added as additional local variables.

Then "x" and "address of x" would be evaluated separately, something
like "__temp_1 = x;", then at next sequence point: "__temp_2 = &x;
*__temp_2 = __temp_1". If "x" is not an l-value, during generation of
"__temp_2 = &x" compiler will fail parsing and show diagnostic.

Pseudo-code is a good thing, it allows easy debugging of the parser
and also - easy processing by optimizer. Pseudo-code should be defined
in a way that it answers C standard's requirements (think that if for
programmers the standard is a guide, for compiler's writer it's an
SRS) and that it includes only operations supported by any sensible
CPU architectures.

Note that while you don't need to care about anything that is
"undefined behavior" (the generated code needs not be meaningful), you
must add special rules processing for the standard's constraints.


Marc van Lieshout 05-16-2010 08:20 PM

Re: Writing a C Compiler: lvalues
 
On 08-05-10 15:34, Andri Wagner wrote:
> What I'm trying to say is: the compiler yields different assembly code
> for when 'x' is a lvalue and when 'x' is not a lvalue.
>
> This gets more confusing when I have expressions such as 'x++'. This
> is simple, since 'x' is obviously a lvalue in this case. In the case
> of the compiler, I can parse 'x' and see that the lookahead points to
> '++', so it's a lvalue.
>
> But what about '(x)++'? In this case, the compiler evaluates the
> subexpression '(x)', and this expression results the value of 'x', not
> the address. Now I have a '++' ahead, so how can I know the address of
> 'x' since all that I have is a value?
>


> All documentation that I found about lvalues were too vague, and
> directed to the programmer, and not to the compiler writer. Are there
> any specific rules for determining if the result of a expression is a
> lvalue?


An lvalue is an expression that evaluates to an address, so it *can* be
used on the left hand side of an assignment. But this is not necessarily
the case. In an expression like (x + 5) x *is* an lvalue, but it isn't
used as such, so it should be compiled as an ordinary rvalue. So in the
expression x = foo, x should be compiled as an lvalue (an address to
which a value is assigned), and in the expression foo = x, x should be
compiled to an rvalue (code that results in the value of x).

As far as I can tell, you're trying syntax-directed translation on a
C-like language. That can be done but, in a grammar like C, you have to
postpone compilation of an identifier until you know how it's used.

If you want to see an example of using immediate (syntax-directed)
compilation of a C-like language, look at the source code of David Betz'
BOB compiler. It compiles to bytecodes, which are interpreted by a
voirtual machine.

The original DrDobbs article:

http://www.drdobbs.com/184409401

The latest sources via:

http://www.xlisp.org/

Eric Sosman 05-17-2010 01:00 PM

Re: Writing a C Compiler: lvalues
 
On 5/16/2010 4:20 PM, Marc van Lieshout wrote:
>
> An lvalue is an expression that evaluates to an address, so it *can* be
> used on the left hand side of an assignment.


That won't quite do. Here are two counter-examples, one an
expression that evaluates to an address but is not an lvalue:

malloc(42)

.... and one an lvalue that cannot possibly involve an address:

register int x;
x = 42;

An lvalue (we're talking C here, right?) "is an expression with an
object type or an incomplete type other than void" (6.3.2.1p1).

--
Eric Sosman
esosman@ieee.org


Keith Thompson 05-17-2010 04:39 PM

Re: Writing a C Compiler: lvalues
 
Marc van Lieshout <marc@lithia.nl> writes:
[...]
> An lvalue is an expression that evaluates to an address, so it *can* be
> used on the left hand side of an assignment.


Note that this is cross-posted to comp.lang.c and comp.compilers.
I'm posting this from comp.lang.c, and I'm using the C standard's
definitions of terms. C's definition of "lvalue" isn't necessarily
consistent with wider usage. For details, see
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf>,
section 6.3.2.1.

No an lvalue is (roughly) an expression that *designates an object*.
(I say "roughly" because ``*ptr'' is an lvalue even if ptr==NULL.)

An lvalue can designate an object that has no address, such as
a bit field or a register variable. Conversely, an expression
that evaluates to an address, such as ``&obj'' is not necessarily
an lvalue.

The distinction, in C, between computing the address of an object
and "designating" an object is subtle but important.

It's very likely that the code generated for evaluating an lvalue will
compute the address of the designated object, which is relevant if
you're writing a compiler, but as far as C is concerned that's an
implementation detail that's not covered by the standard.

> But this is not necessarily
> the case. In an expression like (x + 5) x *is* an lvalue, but it isn't
> used as such, so it should be compiled as an ordinary rvalue.


C99 6.3.2.1p2:

Except when it is the operand of[list of operators deleted],
an lvalue that does not have array type is converted to the value
stored in the designated object (and is no longer an lvalue).

So in (x + 5), x *isn't* an lvalue, even though it started out as one.

[...]

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>

Keith Thompson 05-19-2010 07:11 AM

Re: Writing a C Compiler: lvalues
 
Eric Sosman <esosman@ieee.org> writes:
> On 5/16/2010 4:20 PM, Marc van Lieshout wrote:
>>
>> An lvalue is an expression that evaluates to an address, so it *can* be
>> used on the left hand side of an assignment.

>
> That won't quite do. Here are two counter-examples, one an
> expression that evaluates to an address but is not an lvalue:
>
> malloc(42)
>
> ... and one an lvalue that cannot possibly involve an address:
>
> register int x;
> x = 42;
>
> An lvalue (we're talking C here, right?) "is an expression with an
> object type or an incomplete type other than void" (6.3.2.1p1).


Which is a horribly incomplete definition. 42 is an lvalue by
this definition (since int is an object type), but it's clearly
not intended to be an lvalue.

The essence of lvalue-ness is that an lvalue designates an object.
The trick is defining the term so that *ptr remains an lvalue even
if ptr==NULL.

--
Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst>


All times are GMT. The time now is 09:49 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.