英文:
I'm having a problem with creating a lexer using PLY in Python
问题
我最近尝试创建了一个词法分析器,但它并不顺利。
问题在于它抛出了一个错误消息,显示“无法构建词法分析器”。以下是回溯信息:
错误: 规则 't_TIMES' 已为未指定的标记 TIMES 定义
错误: 规则 't_DIVIDE' 已为未指定的标记 DIVIDE 定义
回溯(最近调用的呼叫最后):
文件“... / Lexer.py”,第 24 行,在
lexer = lex.lex()
^^^^^^^^^
文件“... / lex.py”,第 910 行,在 lex 中
raise SyntaxError("无法构建词法分析器")
SyntaxError: 无法构建词法分析器
我意识到这是因为我的 `t_error()` 函数。我也感觉我制作的标记可能有问题。请帮我解决这个问题,我知道这有点蠢,但我是新手,所以请对我友善一点。
顺便说一下,这是源代码:
```python
import ply.lex as lex
import ply.yacc as yacc
import sys
tokens = [
"INT",
"ID",
"PLUS",
"MINUS",
"EOF",
]
t_INT = r"\d+"
t_ID = r"[a-zA-Z_][a-zA-Z0-9_]*"
t_PLUS = r"\+"
t_MINUS = r"-"
t_TIMES = r"\*"
t_DIVIDE = r"/"
def t_error(t):
print(f"非法字符 '{t.lexer.lexeme}'", file=sys.stderr)
lexer = lex.lex()
def p_expression(p):
"""expression : INT
| ID
| expression PLUS expression
| expression MINUS expression
| expression TIMES expression
| expression DIVIDE expression"""
if len(p) == 2:
if isinstance(p[1], int):
p[0] = p[1]
elif isinstance(p[1], str):
p[0] = p[1]
else:
if p[2] == "+":
p[0] = p[1] + p[3]
elif p[2] == "-":
p[0] = p[1] - p[3]
elif p[2] == "*":
p[0] = p[1] * p[3]
elif p[2] == "/":
p[0] = p[1] / p[3]
parser = yacc.yacc()
def test(text):
try:
result = parser.parse(text)
if result:
print(result)
else:
print("空表达式")
except yacc.YaccError:
print("解析输入时出错")
if __name__ == "__main__":
test("123")
test("hello")
test("123 + 456")
test("123 - 456")
test("123 * 456")
test("123 / 456")
也许我只是愚蠢,但因此我无法让它运行。
<details>
<summary>英文:</summary>
I have tried to create a Lexer recently, and it doesn't work out well.
The problem is it's thrown an error message shows that "Can't build lexer". Here's the traceback:
ERROR: Rule 't_TIMES' defined for an unspecified token TIMES
ERROR: Rule 't_DIVIDE' defined for an unspecified token DIVIDE
Traceback (most recent call last):
File "...\Lexer.py", line 24, in <module>
lexer = lex.lex()
^^^^^^^^^
File "...\lex.py", line 910, in lex
raise SyntaxError("Can't build lexer")
SyntaxError: Can't build lexer
I'm aware that it's because of the `t_error()` function of mine. I also sense the token I've made maybe having a problem. Please help me with that, I know that this is kind of dumb but I'm new, so please be nice to me.
Btw, here's the source code
```python
import ply.lex as lex
import ply.yacc as yacc
import sys
tokens = [
"INT",
"ID",
"PLUS",
"MINUS",
"EOF",
]
t_INT = r"\d+"
t_ID = r"[a-zA-Z_][a-zA-Z0-9_]*"
t_PLUS = r"+"
t_MINUS = r"-"
t_TIMES = r"*"
t_DIVIDE = r"/"
def t_error(t):
print("Illegal character '%s'" % t.lexer.lexeme, file=sys.stderr)
lexer = lex.lex()
def p_expression(p):
"""expression : INT
| ID
| expression PLUS expression
| expression MINUS expression
| expression TIMES expression
| expression DIVIDE expression"""
if len(p) == 2:
if isinstance(p[1], int):
p[0] = p[1]
elif isinstance(p[1], str):
p[0] = p[1]
else:
if p[2] == "+":
p[0] = p[1] + p[3]
elif p[2] == "-":
p[0] = p[1] - p[3]
elif p[2] == "*":
p[0] = p[1] * p[3]
elif p[2] == "/":
p[0] = p[1] / p[3]
parser = yacc.yacc()
def test(text):
try:
result = parser.parse(text)
if result:
print(result)
else:
print("Empty expression")
except yacc.YaccError:
print("Error parsing input")
if __name__ == "__main__":
test("123")
test("hello")
test("123 + 456")
test("123 - 456")
test("123 * 456")
test("123 / 456")
Maybe I'm just stupid, but because of that so I cannot make it to run.
答案1
得分: 0
以下是您要求的内容的翻译:
这些错误...
错误: 未为未指定的令牌TIMES定义规则't_TIMES'
错误: 未为未指定的令牌DIVIDE定义规则't_DIVIDE'
...似乎很明显。您没有在您的tokens
数组中定义名为TIMES
或DIVIDE
的令牌。您需要:
tokens = [
"INT",
"ID",
"PLUS",
"MINUS",
"EOF",
"TIMES",
"DIVIDE",
]
一旦您修复了这些错误,您将得到:
错误: 规则't_PLUS'的正则表达式无效。位置11处没有要重复的内容
错误: 规则't_TIMES'的正则表达式无效。位置12处没有要重复的内容
这是因为字符+
和*
都是正则表达式通配符,因此如果要使用文字字符,您需要转义它们:
t_PLUS = r"\+"
t_TIMES = r"\*"
一旦您修复了那些错误,最终从您的t_error
方法中会得到这个:
AttributeError: 'Lexer'对象没有属性'lexeme'。您是不是想说'lexre'?
似乎没有lexeme
属性,但您可以使用t.value
:
def t_error(t):
print("非法字符 '%s'" % t.value, file=sys.stderr)
修复了这个错误之后,您现在会得到:
123
hello
非法字符 ' + 456'
.
.
.
ply.lex.LexError: 扫描错误。非法字符 ' '
您的表达式中有空格,但您的规则中没有考虑到这一点。快速解决方法是在测试表达式中删除空格:
if __name__ == "__main__":
test("123")
test("hello")
test("123+456")
test("123-456")
test("123*456")
test("123/456")
修复了这个错误之后,您现在会得到:
123
hello
123456
.
.
.
TypeError: 不支持的操作数类型进行减法:'str'和'str'
这是因为您试图在p_expression
方法中添加字符串值。在应用算术运算符之前,您需要将它们转换为数字。最简单的解决方法是将t_INT
的定义替换为以下方法:
def t_INT(t):
r'\d+'
t.value = int(t.value)
return t
现在运行代码会产生:
123
hello
579
-333
56088
0.26973684210526316
英文:
These errors...
ERROR: Rule 't_TIMES' defined for an unspecified token TIMES
ERROR: Rule 't_DIVIDE' defined for an unspecified token DIVIDE
...seem pretty clear. You haven't defined the tokens named TIMES
or DIVIDE
in your tokens
array. You need:
tokens = [
"INT",
"ID",
"PLUS",
"MINUS",
"EOF",
"TIMES",
"DIVIDE",
]
Once you fix those errors, you will get:
ERROR: Invalid regular expression for rule 't_PLUS'. nothing to repeat at position 11
ERROR: Invalid regular expression for rule 't_TIMES'. nothing to repeat at position 12
That's because the characters +
and *
are both regex wildcards, so you need to escape them if you want the literal character:
t_PLUS = r"\+"
t_TIMES = r"\*"
Once you fix those errors, you'll ultimately get this from your t_error
method:
AttributeError: 'Lexer' object has no attribute 'lexeme'. Did you mean: 'lexre'?
There doesn't appear to be a lexeme
attribute, but you can use t.value
:
def t_error(t):
print("Illegal character '%s'" % t.value, file=sys.stderr)
Having fixed that error, you will now get:
123
hello
Illegal character ' + 456'
.
.
.
ply.lex.LexError: Scanning error. Illegal character ' '
You have spaces in your expressions, but you haven't accounted for this in your rules. The quick fix is to remove the spaces in your test expressions:
if __name__ == "__main__":
test("123")
test("hello")
test("123+456")
test("123-456")
test("123*456")
test("123/456")
Having fixed that error, you'll now get:
123
hello
123456
.
.
.
TypeError: unsupported operand type(s) for -: 'str' and 'str'
And that's because you're trying to add string values in your p_expression
method. You need to convert them to numbers before applying arithmetic operators. The easiest solution is to replace your definition of t_INT
with this method:
def t_INT(t):
r'\d+'
t.value = int(t.value)
return t
And now running the code produces:
123
hello
579
-333
56088
0.26973684210526316
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论