语法分析器根据语法将标记流(来自词法分析器)转换为语法树。
目标
从“词法分析器” 任务中获取输出,并根据以下语法将其转换为抽象语法树(AST)。输出应为展平格式。
程序应从文件和/或stdin读取输入,并将输出写入文件和/或stdout。如果使用的语言具有解析器模块/库/类,则提供两种版本的解决方案将是很好的选择:一个不带解析器模块,另一个带解析器模块。
语法
stmt_list = {stmt} ;
stmt = ';'
| Identifier '=' expr ';'
| 'while' paren_expr stmt
| 'if' paren_expr stmt ['else' stmt]
| 'print' '(' prt_list ')' ';'
| 'putc' paren_expr ';'
| '{' stmt_list '}'
;
paren_expr = '(' expr ')' ;
prt_list = (string | expr) {',' (String | expr)} ;
expr = and_expr {'||' and_expr} ;
and_expr = equality_expr {'&&' equality_expr} ;
equality_expr = relational_expr [('==' | '!=') relational_expr] ;
relational_expr = addition_expr [('<' | '<=' | '>' | '>=') addition_expr] ;
addition_expr = multiplication_expr {('+' | '-') multiplication_expr} ;
multiplication_expr = primary {('*' | '/' | '%') primary } ;
primary = Identifier
| Integer
| '(' expr ')'
| ('+' | '-' | '!') primary
得到的AST应该表示为二叉树。
示例
给定一个简单程序(如下),该程序存储在一个名为while.t的文件中,使用一种词法分析器创建标记列表。
lex < while.t > while.lex
运行一种语法分析器
parse < while.lex > while.ast
while.t
count = 1;
while (count < 10) {
print("count is: ", count, "\n");
count = count + 1;
}
while.lex
1 1 Identifier count
1 7 Op_assign
1 9 Integer 1
1 10 Semicolon
2 1 Keyword_while
2 7 LeftParen
2 8 Identifier count
2 14 Op_less
2 16 Integer 10
2 18 RightParen
2 20 LeftBrace
3 5 Keyword_print
3 10 LeftParen
3 11 String "count is: "
3 23 Comma
3 25 Identifier count
3 30 Comma
3 32 String "\n"
3 36 RightParen
3 37 Semicolon
4 5 Identifier count
4 11 Op_assign
4 13 Identifier count
4 19 Op_add
4 21 Integer 1
4 22 Semicolon
5 1 RightBrace
6 1 End_of_input
while.ast
Sequence
Sequence
;
Assign
Identifier count
Integer 1
While
Less
Identifier count
Integer 10
Sequence
Sequence
;
Sequence
Sequence
Sequence
;
Prts
String "count is: "
;
Prti
Identifier count
;
Prts
String "\n"
;
Assign
Identifier count
Add
Identifier count
Integer 1
详细说明
节点类型名称列表
Identifier String Integer Sequence If Prtc Prts Prti While Assign Negate Not Multiply Divide Mod
Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or
在下文中,Null/Empty节点代表";"。
非根(内部)节点
对于操作符,应该创建以下节点:
Multiply Divide Mod Add Subtract Less LessEqual Greater GreaterEqual Equal NotEqual And Or
对于上面的每个节点,左子节点和右子节点是各自操作的操作数。
采用伪S-Expression格式:
(Operator expression expression)
Negate, Not
对于这些节点类型,左节点是操作数,右节点为空。
(Operator expression ;)
Sequence - 子节点是statement或Sequence。
If - 左节点是expression,右节点是if节点, 同时它的左节点是if-true语块右节点是if-false (else) 语块。
(If expression (If statement else-statement))
如果没有else, 这棵树变成:
(If expression (If statement ;))
Prtc
(Prtc (expression) ;)
Prts
(Prts (String "the string") ;)
Prti
(Prti (Integer 12345) ;)
While - 左节点是expression,右节点是statement.
(While expression statement)
Assign - 左节点是赋值的左边,右节点是赋值的右边。
(Assign Identifier expression)
终端(叶子)节点:
Identifier: (Identifier ident_name)
Integer: (Integer 12345)
String: (String "Hello World!")
";": Empty node
举例
实现以下程序:
a=11;
用二叉树来实现以下AST:
每个非叶子节点下有两个’|'行。第一个表示左子节点,第二个表示右子节点
(1) Sequence
(2) |-- ;
(3) |-- Assign
(4) |-- Identifier: a
(5) |-- Integer: 11
扁平化形式:
(1) Sequence
(2) ;
(3) Assign
(4) Identifier a
(5) Integer 11
实现以下程序:
a=11;
b=22;
c=33;
生成AST:
( 1) Sequence
( 2) |-- Sequence
( 3) | |-- Sequence
( 4) | | |-- ;
( 5) | | |-- Assign
( 6) | | |-- Identifier: a
( 7) | | |-- Integer: 11
( 8) | |-- Assign
( 9) | |-- Identifier: b
(10) | |-- Integer: 22
(11) |-- Assign
(12) |-- Identifier: c
(13) |-- Integer: 33
扁平化形式:
( 1) Sequence
( 2) Sequence
( 3) Sequence
( 4) ;
( 5) Assign
( 6) Identifier a
( 7) Integer 11
( 8) Assign
( 9) Identifier b
(10) Integer 22
(11) Assign
(12) Identifier c
(13) Integer 33
解析器的伪代码
使用优先级上升进行表达式解析,并使用递归下降进行语句解析
def expr(p)
if tok is "("
x = paren_expr()
elif tok in ["-", "+", "!"]
gettok()
y = expr(precedence of operator)
if operator was "+"
x = y
else
x = make_node(operator, y)
elif tok is an Identifier
x = make_leaf(Identifier, variable name)
gettok()
elif tok is an Integer constant
x = make_leaf(Integer, integer value)
gettok()
else
error()
while tok is a binary operator and precedence of tok >= p
save_tok = tok
gettok()
q = precedence of save_tok
if save_tok is not right associative
q += 1
x = make_node(Operator save_tok represents, x, expr(q))
return x
def paren_expr()
expect("(")
x = expr(0)
expect(")")
return x
def stmt()
t = NULL
if accept("if")
e = paren_expr()
s = stmt()
t = make_node(If, e, make_node(If, s, accept("else") ? stmt() : NULL))
elif accept("putc")
t = make_node(Prtc, paren_expr())
expect(";")
elif accept("print")
expect("(")
repeat
if tok is a string
e = make_node(Prts, make_leaf(String, the string))
gettok()
else
e = make_node(Prti, expr(0))
t = make_node(Sequence, t, e)
until not accept(",")
expect(")")
expect(";")
elif tok is ";"
gettok()
elif tok is an Identifier
v = make_leaf(Identifier, variable name)
gettok()
expect("=")
t = make_node(Assign, v, expr(0))
expect(";")
elif accept("while")
e = paren_expr()
t = make_node(While, e, stmt()
elif accept("{")
while tok not equal "}" and tok not equal end-of-file
t = make_node(Sequence, t, stmt())
expect("}")
elif tok is end-of-file
pass
else
error()
return t
def parse()
t = NULL
gettok()
repeat
t = make_node(Sequence, t, stmt())
until tok is end-of-file
return t
一旦构建了AST,就应该以扁平格式输出它。这可以像下面这样
def prt_ast(t)
if t == NULL
print(";\n")
else
print(t.node_type)
if t.node_type in [Identifier, Integer, String] # leaf node
print the value of the Ident, Integer or String, "\n"
else
print("\n")
prt_ast(t.left)
prt_ast(t.right)
如果正确地构建了AST,那么将它用以下操作加载到后续程序
def load_ast()
line = readline()
# Each line has at least one token
line_list = tokenize the line, respecting double quotes
text = line_list[0] # first token is always the node type
if text == ";" # a terminal node
return NULL
node_type = text # could convert to internal form if desired
# A line with two tokens is a leaf node
# Leaf nodes are: Identifier, Integer, String
# The 2nd token is the value
if len(line_list) > 1
return make_leaf(node_type, line_list[1])
left = load_ast()
right = load_ast()
return make_node(node_type, left, right)
最后,还可以通过对AST解释器解决方案之一运行AST来测试它。
假设它测试程序一个名为prime.t的文件中
lex <prime.t | parse
输入到词法分析器的代码:
/*
Simple prime number generator
*/
count = 1;
n = 1;
limit = 100;
while (n < limit) {
k=3;
p=1;
n=n+2;
while ((k*k<=n) && (p)) {
p=n/k*k!=n;
k=k+2;
}
if (p) {
print(n, " is prime\n");
count = count + 1;
}
}
print("Total primes found: ", count, "\n");
从词法分析器输出,并输入到语法分析器:
4 1 Identifier count
4 7 Op_assign
4 9 Integer 1
4 10 Semicolon
5 1 Identifier n
5 3 Op_assign
5 5 Integer 1
5 6 Semicolon
6 1 Identifier limit
6 7 Op_assign
6 9 Integer 100
6 12 Semicolon
7 1 Keyword_while
7 7 LeftParen
7 8 Identifier n
7 10 Op_less
7 12 Identifier limit
7 17 RightParen
7 19 LeftBrace
8 5 Identifier k
8 6 Op_assign
8 7 Integer 3
8 8 Semicolon
9 5 Identifier p
9 6 Op_assign
9 7 Integer 1
9 8 Semicolon
10 5 Identifier n
10 6 Op_assign
10 7 Identifier n
10 8 Op_add
10 9 Integer 2
10 10 Semicolon
11 5 Keyword_while
11 11 LeftParen
11 12 LeftParen
11 13 Identifier k
11 14 Op_multiply
11 15 Identifier k
11 16 Op_lessequal
11 18 Identifier n
11 19 RightParen
11 21 Op_and
11 24 LeftParen
11 25 Identifier p
11 26 RightParen
11 27 RightParen
11 29 LeftBrace
12 9 Identifier p
12 10 Op_assign
12 11 Identifier n
12 12 Op_divide
12 13 Identifier k
12 14 Op_multiply
12 15 Identifier k
12 16 Op_notequal
12 18 Identifier n
12 19 Semicolon
13 9 Identifier k
13 10 Op_assign
13 11 Identifier k
13 12 Op_add
13 13 Integer 2
13 14 Semicolon
14 5 RightBrace
15 5 Keyword_if
15 8 LeftParen
15 9 Identifier p
15 10 RightParen
15 12 LeftBrace
16 9 Keyword_print
16 14 LeftParen
16 15 Identifier n
16 16 Comma
16 18 String " is prime\n"
16 31 RightParen
16 32 Semicolon
17 9 Identifier count
17 15 Op_assign
17 17 Identifier count
17 23 Op_add
17 25 Integer 1
17 26 Semicolon
18 5 RightBrace
19 1 RightBrace
20 1 Keyword_print
20 6 LeftParen
20 7 String "Total primes found: "
20 29 Comma
20 31 Identifier count
20 36 Comma
20 38 String "\n"
20 42 RightParen
20 43 Semicolon
21 1 End_of_input
语法分析器的输出:
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier count
Integer 1
Assign
Identifier n
Integer 1
Assign
Identifier limit
Integer 100
While
Less
Identifier n
Identifier limit
Sequence
Sequence
Sequence
Sequence
Sequence
;
Assign
Identifier k
Integer 3
Assign
Identifier p
Integer 1
Assign
Identifier n
Add
Identifier n
Integer 2
While
And
LessEqual
Multiply
Identifier k
Identifier k
Identifier n
Identifier p
Sequence
Sequence
;
Assign
Identifier p
NotEqual
Multiply
Divide
Identifier n
Identifier k
Identifier k
Identifier n
Assign
Identifier k
Add
Identifier k
Integer 2
If
Identifier p
If
Sequence
Sequence
;
Sequence
Sequence
;
Prti
Identifier n
;
Prts
String " is prime\n"
;
Assign
Identifier count
Add
Identifier count
Integer 1
;
Sequence
Sequence
Sequence
;
Prts
String "Total primes found: "
;
Prti
Identifier count
;
Prts
String "\n"
;