如何写一个简单的解释器（Interpreter）-2

Burger和Starbird合著过一本书《高效思考的五个要点》，里面提到Tony Plog，他是个吹喇叭的艺术家，开办了一个喇叭培训班。他班上都是成熟的喇叭演奏手。当这些演奏手吹奏复杂的歌曲的时候，吹的很精彩，但是当Tony让他们吹奏简单的音节的时候，听起来就很不成熟，甚至幼稚。Tony结果喇叭也吹了一次，听起来依然很成熟。众人惊诧不已。Tony解释说，吹好复杂的曲子容易，吹好简单的音节却很难，因为它要求吹奏的人对气息的控制力要非常强才可以。这给了我一个启发：构造足够复杂的系统，比如要聚焦在精通简单和基本的元素上。

是的，即使有时候你觉得花功夫在简单的事儿上，好像是退步一样。同样的道理，熟练地使用一个工具或者一个框架很重要，但是，懂得它们背后的原理更重要。正像Ralph Waldo Emerson说的：

”如果仅仅学会了一些方法，那你就会依赖它们解决遇到的每一个问题。但如果学会了方法背后的原理，你就是创造出自己的方法。“

回到interpreter和compiler上来，这一节我会带着你完善第一节的程序，让interpreter可以：

处理空白字符
处理多位数
支持减法操作

代码如下：

# Token types
# EOF (end-of-file) token is used to indicate that
# there is no more input left for lexical analysis
INTEGER, PLUS, MINUS, EOF = 'INTEGER', 'PLUS', 'MINUS', 'EOF'


class Token(object):
    def __init__(self, type, value):
        # token type: INTEGER, PLUS, MINUS, or EOF
        self.type = type
        # token value: non-negative integer value, '+', '-', or None
        self.value = value

    def __str__(self):
        """String representation of the class instance.

        Examples:
            Token(INTEGER, 3)
            Token(PLUS '+')
        """
        return 'Token({type}, {value})'.format(
            type=self.type,
            value=repr(self.value)
        )

    def __repr__(self):
        return self.__str__()


class Interpreter(object):
    def __init__(self, text):
        # client string input, e.g. "3 + 5", "12 - 5", etc
        self.text = text
        # self.pos is an index into self.text
        self.pos = 0
        # current token instance
        self.current_token = None
        self.current_char = self.text[self.pos]

    def error(self):
        raise Exception('Error parsing input')

    def advance(self):
        """Advance the 'pos' pointer and set the 'current_char' variable."""
        self.pos += 1
        if self.pos > len(self.text) - 1:
            self.current_char = None  # Indicates end of input
        else:
            self.current_char = self.text[self.pos]

    def skip_whitespace(self):
        while self.current_char is not None and self.current_char.isspace():
            self.advance()

    def integer(self):
        """Return a (multidigit) integer consumed from the input."""
        result = ''
        while self.current_char is not None and self.current_char.isdigit():
            result += self.current_char
            self.advance()
        return int(result)

    def get_next_token(self):
        """Lexical analyzer (also known as scanner or tokenizer)

        This method is responsible for breaking a sentence
        apart into tokens.
        """
        while self.current_char is not None:

            if self.current_char.isspace():
                self.skip_whitespace()
                continue

            if self.current_char.isdigit():
                return Token(INTEGER, self.integer())

            if self.current_char == '+':
                self.advance()
                return Token(PLUS, '+')

            if self.current_char == '-':
                self.advance()
                return Token(MINUS, '-')

            self.error()

        return Token(EOF, None)

    def eat(self, token_type):
        # compare the current token type with the passed token
        # type and if they match then "eat" the current token
        # and assign the next token to the self.current_token,
        # otherwise raise an exception.
        if self.current_token.type == token_type:
            self.current_token = self.get_next_token()
        else:
            self.error()

    def expr(self):
        """Parser / Interpreter

        expr -> INTEGER PLUS INTEGER
        expr -> INTEGER MINUS INTEGER
        """
        # set current token to the first token taken from the input
        self.current_token = self.get_next_token()

        # we expect the current token to be an integer
        left = self.current_token
        self.eat(INTEGER)

        # we expect the current token to be either a '+' or '-'
        op = self.current_token
        if op.type == PLUS:
            self.eat(PLUS)
        else:
            self.eat(MINUS)

        # we expect the current token to be an integer
        right = self.current_token
        self.eat(INTEGER)
        # after the above call the self.current_token is set to
        # EOF token

        # at this point either the INTEGER PLUS INTEGER or
        # the INTEGER MINUS INTEGER sequence of tokens
        # has been successfully found and the method can just
        # return the result of adding or subtracting two integers,
        # thus effectively interpreting client input
        if op.type == PLUS:
            result = left.value + right.value
        else:
            result = left.value - right.value
        return result


def main():
    while True:
        try:
            # To run under Python3 replace 'raw_input' call
            # with 'input'
            text = raw_input('calc> ')
        except EOFError:
            break
        if not text:
            continue
        interpreter = Interpreter(text)
        result = interpreter.expr()
        print(result)


if __name__ == '__main__':
    main()

可以自己试一下程序看看有没有问题

$ python calc2.py
calc> 27 + 3
30
calc> 27 - 7
20
calc>

这一版主要的改动包括：

get_next_token 方法被重构了一点点。pos指针自增的逻辑被重构成了一个方法，叫advance。
新增加了两个方法：skip_whitespace 用来忽略空白字符，integer 用来处理多位数。
expr 方法也被改了，可以同时识别INTEGER -> MINUS -> INTEGER 和 INTEGER -> PLUS -> INTEGER 这两个串。.

第一章介绍过 token 和词法分析器了，这一章我要谈谈词位（lexeme）、解析和解析器。

为了更好的理解token，词位这个概念躲不开。词位指的是一系列字符串，用来组成token。下图可以帮助理解。

还记得我们的老朋友—— expr 方法吗？他用来识别和解释算术表达式，得到正确的结果。想识别表达式，首先他就要做到识别短语，是加，还是减。对，这真是我们的老朋友的职责所在，他通过get_next_token得到token的序列，进而得知token序列的结构，按照结构来计算，最后得到计算结果。

得知token序列的结构，或者说识别token序列中的不同短语，这个过程就是解析。这部分interpreter和compiler的组成模块叫解析器。

具体说我们的老朋友吧，expr 方法首先解析 INTEGER -> PLUS -> INTEGER 或者INTEGER -> MINUS -> INTEGER 这些短语，解析成功任何一个以后，他就开始解释（interpret）它，并计算出结果。

如何写一个简单的解释器（Interpreter）-2

猜你喜欢