Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Ruby (http://www.velocityreviews.com/forums/f66-ruby.html)
-   -   [QUIZ] Bytecode Compiler (#100) (http://www.velocityreviews.com/forums/t835193-quiz-bytecode-compiler-100-a.html)

Ruby Quiz 11-03-2006 03:32 PM

[QUIZ] Bytecode Compiler (#100)
 
The three rules of Ruby Quiz:

1. Please do not post any solutions or spoiler discussion for this quiz until
48 hours have passed from the time on this message.

2. Support Ruby Quiz by submitting ideas as often as you can:

http://www.rubyquiz.com/

3. Enjoy!

Suggestion: A [QUIZ] in the subject of emails about the problem helps everyone
on Ruby Talk follow the discussion. Please reply to the original quiz message,
if you can.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

by Ross Bamford

Note: This quiz isn't really as much work as it might seem!

This quiz involves writing (in Ruby, of course) a compiler for basic arithmetic
expressions. The output from this compiler should be an array of unsigned
byte-sized ints, which can be fed into the included interpreter
(http://www.rubyquiz.com/interp.rb) in order to execute the compiled expression.

The bytecode format is very simple, while the interpreter is also very simple,
implemented as a stack machine. They have the following general characteristics:

* Bytecode is stored as an array of unsigned byte-sized Fixnums.
* All stack-bound numbers are signed integers
* The following operations are supported:
* Addition
* Subtraction
* Multiplication
* Division
* Raise to power
* Integer modulo
* Where an operator would return a floating point value,
the value is truncated to an integer.
* Short CONST and long LCONST instructions allow constants
to be pushed to the stack. These instructions expect their
operands to hold a signed short or long, respectively,
in network byte order.

Your compiler interface should be via a singleton method on a module 'Compiler',
taking a string, such that:

Compiler.compile('3+2')

Returns an array of instructions (and operands) that, when fed to the
interpreter, will execute the expression 3+2 and return the result (hopefully
5). For example, a correct (but non-optimal) result from the above call would
be:

[2,0,0,0,3, # LCONST 3
2,0,0,0,2, # LCONST 2
10] # ADD

Your compiler should support all basic arithmetic operations and explicit
precedence (parenthesis). As standard, syntax/precedence/ associativity/etc.
should follow Ruby itself. Obviously, specific implementation is entirely up to
you, though bear in mind that your compiler must be capable of running inline in
the same Ruby process as the interpreter, without affecting any code outside
itself.

The quiz also includes a number of tests
(http://www.rubyquiz.com/test_bytecode.rb) that will test your compiler's
functionality, with expressions becoming more complex as the tests go on. To
pass all the tests, a compiler will have to not only generate correct bytecode,
but it will also need to generate the shortest code it can for a given
expression.

Here is the bytecode spec:

# 0x01: CONST (cbyte1, cbyte2) ... => ..., const
Push a 15-bit signed integer to the stack.
The two bytes immediately following the instruction represent the
constant.

# 0x02: LCONST (cbyte1, cbyte2, cbyte3, cbyte4) ... => ..., const
Push a 31-bit signed integer to the stack.
The four bytes immediately following the instruction represent the
constant.

# 0x0a: ADD () ..., value1, value2 => ..., result
Pop the top two values from the stack, add them, and push the result
back onto the stack.

# 0x0b: SUB () ..., value1, value2 => ..., result
Pop the top two values from the stack, subtract value2 from value1,
and push the result back onto the stack.

# 0x0c: MUL () ..., value1, value2 => ..., result
Pop the top two values from the stack, multiply value1 by value2,
and push the result back onto the stack.

# 0x0d: POW () ..., value1, value2 => ..., result
Pop the top two values from the stack, raise value1 to the power of
value2, and push the result back onto the stack.

# 0x0e: DIV () ..., value1, value2 => ..., result
Pop the top two values from the stack, divide value1 by value2,
and push the result back onto the stack.

# 0x0f: MOD () ..., value1, value2 => ..., result
Pop the top two values from the stack, modulo value1 by value2,
and push the result back onto the stack.

# 0xa0: SWAP () ..., value1, value2 => ..., value2, value1
Swap the top two stack values.


Wilson Bilkovich 11-05-2006 03:43 PM

Re: [QUIZ] Bytecode Compiler (#100)
 
On 11/3/06, Ruby Quiz <james@grayproductions.net> wrote:
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
> by Ross Bamford
>
> Note: This quiz isn't really as much work as it might seem!
>
> This quiz involves writing (in Ruby, of course) a compiler for basic arithmetic
> expressions. The output from this compiler should be an array of unsigned
> byte-sized ints, which can be fed into the included interpreter
> (http://www.rubyquiz.com/interp.rb) in order to execute the compiled expression.
>



Well... this took me longer than it probably should have.
I didn't find out about this:
http://en.wikipedia.org/wiki/Shunting_yard_algorithm
..until I was pretty much done anyway.

Also, helpful tip I'd like to send back to my past self.. Regular
expressions aren't powerful enough to find matched parentheses.

Anyway, thanks for the quiz. Fun stuff.

class Compiler
def self.compile(input)
@bytecodes = {'+' => 0x0a, '-' => 0x0b, '*' => 0x0c, '**' => 0x0d,
'/' => 0x0e, '%' => 0x0f}
encode postfix(input)
end

def self.encode(tokens)
tokens.collect do |token|
number = token =~ /\-?\d+/ ? token.to_i : nil
if (-32768..32767).include?(number)
[0x01] + [number].pack('n').split(//).collect {|byte| byte[0]}
elsif !number.nil? # long
[0x02] + [number].pack('N').split(//).collect {|byte| byte[0]}
else
@bytecodes[token]
end
end.flatten
end

def self.postfix(infix)
stack, stream, last, op = [], [], nil, nil
tokens = infix.scan(/\d+|\*\*|[-+*\/()%]/)
tokens.each_with_index do |token,i|
case token
when /\d+/; stream << token
when *@bytecodes.keys
if token == '-' and last.nil? || (last =~ /\D/ && tokens[i+1] =~ /\d/)
tokens[i+1] = "-#{tokens[i+1]}"
else
stream << stack.pop while stack.any? && preceded?(stack.last, token)
stack << token
end
when '('; stack << token
when ')'; stream << op while (op = stack.pop) && (op != '(')
end
last = token
end
stream << op while op = stack.pop
stream
end

def self.preceded?(last, current)
ops = {'+' => 1, '-' => 1, '%' => 2, '/' => 2, '*' => 2, '**' =>
3, '(' => 0, ')' => 0}
ops[last] >= ops[current] rescue true
end
end


Bill Dolinar 11-05-2006 03:43 PM

Re: [QUIZ] Bytecode Compiler (#100)
 
require 'interp'

module Compiler

# use eval and Value class below to compile
# expression into bytecode
def Compiler.compile(s)
s.gsub!(/([0-9]+)/, 'Value.new(stack, \1)')
stack = []
eval(s)
stack
end

class Value
attr_reader :number # constant value or nil for on stack
ON_STACK = nil

def initialize(stack, number)
@number = number
@stack = stack
end

# generate code for each binary operator (except -@)
# algorithm:
# push constants (or don't if already on stack)
# swap if necessary
# push bytecode
# create stack item
{'+' => Interpreter::Ops::ADD,
'-' => Interpreter::Ops::SUB,
'*' => Interpreter::Ops::MUL,
'**'=> Interpreter::Ops::POW,
'/' => Interpreter::Ops::DIV,
'%' => Interpreter::Ops::MOD}.each do |operator, byte_code|
Value.module_eval <<-FUNC
def #{operator}(rhs)
push_const(@number)
push_const(rhs.number)
# may need to swap integers on stack for all but plus
#{
if operator != "+"
"@stack << Interpreter::Ops::SWAP if rhs.number == nil &&
@number != nil"
end
}
@stack << #{byte_code}
Value.new(@stack, ON_STACK)
end
FUNC
end

def -@
if @number != ON_STACK
@number = -@number
push_const(@number)
else
push_const(@number)
push_const(0)
@stack << Interpreter::Ops::SWAP
@stack << Interpreter::Ops::SUB
end
Value.new(@stack, ON_STACK)
end

def push_const(number)
if number != ON_STACK
if (-32768..32767).include?(number)
@stack << Interpreter::Ops::CONST
else
@stack << Interpreter::Ops::LCONST
@stack << ((number >> 24) & 0xff)
@stack << ((number >> 16) & 0xff)
end
@stack << ((number >> 8) & 0xff)
@stack << (number & 0xff)
end
end
end
end




Cameron Pope 11-05-2006 04:06 PM

Re: Bytecode Compiler (#100)
 
The simplest way I found to do this problem was to let Ruby do the
legwork of parsing the expression for me, so I didn't have to worry
about things like parens or operator precedence.

I defined a Const and an Expr class and defined operators inside of
them to create a parse tree, added a to_const method to Fixum and
created a regular expression to convert an expression 1+1 to
'1.to_const() + 1.to_const()', so evaluating that expression would
produce a parse tree for that experssion.

Emitting the bytescodes is then a post-order traversal of the parse
tree.

---- Compiler.rb

# Operator overrides to create an expression tree. Mixed into
# Const and Expr so:
# Const <op> Const => Expr
# Const <op> Expr => Expr
# Expr <op> Const => Expr
module CreateExpressions
def +(other) Expr.new(:add, self, other) end
def -(other) Expr.new(:sub, self, other) end
def *(other) Expr.new(:mul, self, other) end
def /(other) Expr.new(:div, self, other) end
def %(other) Expr.new(:mod, self, other) end
def **(other) Expr.new(:pow, self, other) end
end

# Add a method to fixnum to create a const from an integer
class Fixnum
def to_const
Const.new(self)
end
end

# An integer value
class Const
include CreateExpressions
# Opcodes to push shorts or longs respectively onto the stack
OPCODES = {2 => 0x01, 4 => 0x02}

def initialize(i)
@value = i
end

def to_s
@value
end

# Emits the bytecodes to push a constant on the stack
def emit
# Get the bytes in network byte order
case @value
when (-32768..32767): bytes = [@value].pack("n").unpack("C*")
else bytes = [@value].pack("N").unpack("C*")
end
bytes.insert 0, OPCODES[bytes.size]
end
end

# A binary expression
class Expr
include CreateExpressions
OPCODES = {:add => 0x0a, :sub => 0x0b, :mul => 0x0c, :pow => 0x0d,
:div => 0x0e, :mod => 0x0f}

def initialize(op, a, b)
@op = op
@first = a
@second = b
end

# Emits a human-readable s-expression for testing
# (preorder traversal of parse tree)
def to_s
"(#{@op.to_s} #{@first.to_s} #{@second.to_s})"
end

# Bytecode emitter for an expression (postorder traversal of parse
tree)
def emit
# emit LHS, RHS, opcode
@first.emit << @second.emit << OPCODES[@op]
end
end

# Compile and print out parse tree for expressions
class Compiler
# Creates bytecodes from an arithmatic expression
def self.compile(expr)
self.mangle(expr).emit.flatten
end

# Prints a representation of the parse tree as an S-Expression
def self.explain(expr)
self.mangle(expr).to_s
end

private
# Name-mangles an expression so we create a parse tree when calling
# Kernel#eval instead of evaluating the expression:
# [number] => [number].to_const()
def self.mangle(expr)
eval(expr.gsub(/\d+/) {|s| "#{s}.to_const()"})
end
end


Dennis Ranke 11-05-2006 05:18 PM

Re: [QUIZ] Bytecode Compiler (#100)
 
Here is my solution, a simple recursive descent parser. It's a bit more
code than is strictly necessary because it is loosely modeled after a
parser for a real language I have written recently.

require 'English'

class Compiler
def self.compile(expr)
return self.new(expr).compile.unpack('C*')
end

def initialize(expr)
# a very simple tokenizer
@tok = []
until expr.empty?
case expr
when /\A\s+/ # skip whitespace
# don't tokenize '1-1' as '1', '-1'
when (@tok.last.is_a? Integer) ? /\A\d+/ : /\A\-?\d+/
@tok << $MATCH.to_i
# any other character and '**' are literal tokens
when /\A\*\*|./
@tok << $MATCH
end
expr = $POSTMATCH
end
end

def compile
code = compile_expr(0)
raise "syntax error" unless @tok.empty?
return code
end

private

OPS = {'+'=>0xa, '-'=>0xb, '*'=>0xc, '**'=>0xd, '/'=>0xe, '%'=>0xf}

def compile_expr(level)
# get the tokens to parse at this precedence level
tok = [['+', '-'], ['*', '/', '%'], ['**']][level]
if tok
# if we are to actually parse a bi-op, do so
left = compile_expr(level + 1)
# for left-associative ops, find as many ops in a row as possible
while tok.include?(@tok.first)
op = OPS[@tok.shift]
# '**' is right-associative, so add a special case for that
right = compile_expr(op == OPS['**'] ? level : level + 1)
left << right + op.chr
end
return left
end
# if we are at a level higher than the ops, try to parse an
# atomic - either a numeral or an expression in paranthesis
tok = @tok.shift
if tok == '('
expr = compile_expr(0)
raise "')' expected" unless @tok.shift == ')'
return expr
end
raise 'number expected' unless tok.is_a? Integer
return (tok < -32768 || tok > 32767) ? [2, tok].pack('CN') :
[1, tok].pack('Cn')
end
end

if $0 == __FILE__
p Compiler.compile(ARGV[0])
end



James Edward Gray II 11-05-2006 06:21 PM

Re: [QUIZ] Bytecode Compiler (#100)
 
On Nov 5, 2006, at 9:43 AM, Wilson Bilkovich wrote:

> Also, helpful tip I'd like to send back to my past self.. Regular
> expressions aren't powerful enough to find matched parentheses.


Perl's regex engine can do this, as can Ruby 1.9's Oniguruma engine.
Both allow recursive definitions, which is what it takes.

James Edward Gray II


James Edward Gray II 11-05-2006 06:22 PM

Re: [QUIZ] Bytecode Compiler (#100)
 
On Nov 5, 2006, at 11:18 AM, Dennis Ranke wrote:

> Here is my solution, a simple recursive descent parser. It's a bit
> more code than is strictly necessary because it is loosely modeled
> after a parser for a real language I have written recently.


Did you write that language in Ruby? I'm just curious.

James Edward Gray II


Dennis Ranke 11-05-2006 06:39 PM

Re: [QUIZ] Bytecode Compiler (#100)
 
James Edward Gray II wrote:
> On Nov 5, 2006, at 11:18 AM, Dennis Ranke wrote:
>
>> Here is my solution, a simple recursive descent parser. It's a bit
>> more code than is strictly necessary because it is loosely modeled
>> after a parser for a real language I have written recently.

>
> Did you write that language in Ruby? I'm just curious.


I wrote the (fairly simple) compiler in Ruby and the bytecode
interpreter in C++. The whole thing is used as a scripting language for
a Nintendo DS game.


Wilson Bilkovich 11-06-2006 02:58 AM

Re: [QUIZ] Bytecode Compiler (#100)
 
On 11/5/06, Daniel Martin <martin@snowplow.org> wrote:
> Note that the creator of this quiz left out one important case from
> their tests:
>
> def test_02a
> assert_equal [2**1**2], Interpreter.new(Compiler.compile('2**1**2')).run
> end
>
> This tests that your compiler is properly making **
> right-associative. Some solutions already posted fail this test.
>
> 2**2**2 was an unfortunate test case to choose, since 2**4 == 4**2.


Good catch. Here's an updated copy of mine that:
A) Steals your cool C* unpack trick. :)
B) Gets rid of a temporary variable and a couple of 'pop' loops in
exchange for some more golf.
C) Avoids re-initializing the hashes for improved performance.
D) Passes your new test_02a

class Compiler
def self.compile(input)
@bytecodes ||= {'+' => 0x0a, '-' => 0x0b, '*' => 0x0c, '**' =>
0x0d, '/' => 0x0e, '%' => 0x0f}
encode postfix(input)
end

def self.encode(tokens)
tokens.collect do |token|
number = token =~ /\-?\d+/ ? token.to_i : nil
if (-32768..32767).include?(number)
[0x01] + [number].pack('n').unpack('C*')
elsif !number.nil? # long
[0x02] + [number].pack('N').unpack('C*')
else
@bytecodes[token]
end
end.flatten
end

def self.postfix(infix)
stack, stream, last = [], [], nil
tokens = infix.scan(/\d+|\*\*|[-+*\/()%]/)
tokens.each_with_index do |token,i|
case token
when /\d+/; stream << token
when *@bytecodes.keys
if token == '-' and last.nil? || (last =~ /\D/ && tokens[i+1] =~ /\d/)
tokens[i+1] = "-#{tokens[i+1]}"
else
stream << stack.pop while stack.any? && preceded?(stack.last, token)
stack << token
end
when '('; stack << token
when ')'; (stream += stack.slice!(stack.rindex('('),
stack.size).reverse).pop
end
last = token
end
stream += stack.reverse
end

def self.preceded?(last, current)
@ops ||= {'+' => 1, '-' => 1, '%' => 2, '/' => 2, '*' => 2, '**'
=> 3, '(' => 0, ')' => 0}
@ops[last] >= @ops[current] && current != '**' # right associative mayhem!
end
end


Mike Harris 11-06-2006 03:21 AM

Re: [QUIZ] Bytecode Compiler (#100)
 
My solution is a bit different than all the ones I've seen so far. I
had no intention of writing an expression parser :-) All it does is

1. Define a method (to_bc) to have a fixnum return its own bytecode
2. Redefine the array operators to return the appropriate bytecode
representations
3. Add .to_bc after every number in the expression string

Here it is:

class Compiler
def self.compile(str)

eval(str.gsub(/(\d+)([^\d])/,'\1.to_bc\2').gsub(/([^\d])(\d+)$/,'\1\2.to_bc'))
end
end

class Fixnum
def to_bc
return (self >= 0 ? [1,self/256,self%256] :
[1,(self+32768)/256+128,(self+32768)%256]) if self <= 32767 and self >=
-32768
res = [(24..30),(16..23),(8..15),(0..7)].map { |range| range.map {
|x| self[x] } }.map { |byte| byte.inject_with_index(0) { |s,x,i|
s+x*2**i } }
([2] << (self > 0 ? res[0] : res[0]+128) << res[1..3]).flatten
end
end

class Array
{:+ => 10, :- => 11, :* => 12, :** => 13, :/ => 14, :% => 15}.each do
|op,opcode|
define_method(op) { |x| self.concat(x).concat([opcode]) }
end
def inject_with_index(sum)
each_with_index { |x,i| sum = yield(sum,x,i) }
sum
end
end



All times are GMT. The time now is 10:28 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.