How to Build a Math Evaluation Engine

Jesus CastelloMay 25th, 2016Last Updated: May 23rd, 2016

0 90 4 minutes read

One of the first things you learn to do with Ruby is to evaluate a mathematical expression in irb. But what’s the magic behind this seemingly trivial operation? Let’s create our own evaluation engine and find out!

Note: You could cheat and just use eval, but that would defeat the point of this exercise.

Tokenizing Our Input

We are given a string as input, but what we need are numbers and operators. We can use the StringScanner class and ‘tag’ the different parts of the input so we can recognize them.

This is the main method of the Tokenizer class:

def parse(input)
  @buffer = StringScanner.new(input)

  until @buffer.eos?
    skip_spaces
    read_tokens
  end

  @tokens
end

In this method, I made a StringScanner object with the input string as an argument, then I have a loop which keeps going until the end of the string is reached. Inside this loop, I call two methods: skip_spaces and read_tokens.

The skip_spaces method is very simple:

def skip_spaces
  @buffer.skip(/\s+/)
end

The /\s+/ part is a regular expression that matches spaces (including tabs and newlines). Then we have read_tokens, which is where most of the work happens:

RULES.each do |regex, type|
  token = @buffer.scan(regex)

  if token
    @tokens << Token.new(type, token)
  end
end

In this method, we try to match part of the input string against one of our parsing rules, which are just regular expressions. When a match is found, we create a new token object with the type and value. For example, for the number ‘5’ we would have Token.new(:int, 5).

RULES = {
  /\d+/      => :int,
  /[\/+\-*]/ => :op
}

And that’s all we need to convert our input string into tokens. Now let’s see how we can turn those tokens into something more useful.

The Shunting Yard Algorithm

Now that we have our tokens, we need to convert them into a format we can work with. The normal format for a math expression is called ‘infix’. The problem with this format is that it’s hard to evaluate because of operation precedence. There is another way to represent a mathematical expression: the ‘postfix’ notation.

Examples:

Infix ___________	Postfix
3 + 4	3 4 +
9 * 4	9 4 *
2 * 5 + 1	2 5 * 1 +

We are going to use the Shunting-yard algorithm to convert from the infix notation to postfix. The way this algorithm works is by using two stacks, one for numbers and another for operators.

Follow these steps:

Read a token.
If it’s a number, put it in the numbers stack.
If it’s an operator, put it in the operators stack, but first, check the precedence of the operator on the top of the stack. Move it into the numbers stack if the precedence is higher or equal.
When you’ve read all the tokens, pop all the operators and put them into the numbers stack.
Done!

Let’s take a look at the code, starting with the initialize method:

def initialize(tokens)
  @tokens    = tokens
  @output    = []
  @operators = []
end

In this method, we do our initial setup by using the tokens from the previous phase (tokenization) and creating two arrays that will serve as our stacks. Let’s continue our exploration of the code:

@tokens.each do |token|
  push_number(token)      if token.type == :int
  process_operator(token) if token.type == :op
end

Inside the run method, we iterate over our tokens array and put them in the correct stacks.

def operator_priority_is_less_or_equal_than(token)
  @operators.last && (token.priority <= @operators.last.priority)
end

This method helps us handle operator precedence. The priority is a method defined in the Token class.

def priority
  return 1 if value == '+' || value == '-'
  return 2 if value == '*' || value == '/'
end

When we’re done, we just dump the remaining operators into our output stack.

@output += @operators.reverse

We need to call reverse here. Putting things into a stack and taking them out reverses them, but in this case, we want the original order.

And that’s all it takes to convert to postfix notation (also know as Reverse Polish Notation or RPN).

Evaluation

Now that we have our expression in postfix notation, it’s very easy to evaluate using a single stack.

Algorithm:

We put all the numbers we find on the numbers stack.
Whenever we find an operator, we pop two numbers from the stack.
Then we calculate the result and put the result back on the top of the stack.
Repeat until done. Last number will be the final result.

tokens.each do |token|
  @numbers << token if token.type == :int

  if token.type == :op
    right_num = @numbers.pop
    left_num  = @numbers.pop

    raise 'Invalid postfix expression' unless right_num && left_num

    result = evaluate(right_num.value, left_num.value, token)
    @numbers << Token.new(:int, result)
  end
end

This is the main loop for the evaluation method. Once we find an operation, we call the evaluate method and create a new token object with the results.

def evaluate(right_num, left_num, operation)
  case operation.value
  when '+' then left_num + right_num
  when '-' then left_num - right_num
  when '*' then left_num * right_num
  when '/' then left_num / right_num
  end
end

There is nothing special about this method; we just apply the operation and return the result.

@numbers.last.value

Once all the operators have been processed, we’ll have one token left on our numbers stack. This token is the final result.

Conclusion

In this post, you learned about tokenization, which lets you break a string into its component values. This is what most programming languages (like Ruby or JavaScript) do behind the scenes to your source code so they can turn it into computer instructions.

Then you learned about two different ways to arrange a mathematical expression, ‘infix’ and ‘postfix’, and how to convert between them by using the ‘Shunting-Yard algorithm’. You’ve also learned about the ‘stack’ data structure and what it can be useful for.

I hope you enjoyed this thought exercise! You can find the final code here: https://github.com/matugm/math-eval

Reference:

How to Build a Math Evaluation Engine from our WCG partner Jesus Castello at the Codeship Blog blog.