我遇到了在Python中使用PLY创建词法分析器的问题。

huangapple go评论78阅读模式
英文:

I'm having a problem with creating a lexer using PLY in Python

问题

我最近尝试创建了一个词法分析器,但它并不顺利。

问题在于它抛出了一个错误消息,显示“无法构建词法分析器”。以下是回溯信息:

错误: 规则 't_TIMES' 已为未指定的标记 TIMES 定义
错误: 规则 't_DIVIDE' 已为未指定的标记 DIVIDE 定义
回溯(最近调用的呼叫最后):
文件“... / Lexer.py”,第 24 行,在
lexer = lex.lex()
^^^^^^^^^
文件“... / lex.py”,第 910 行,在 lex 中
raise SyntaxError("无法构建词法分析器")
SyntaxError: 无法构建词法分析器

我意识到这是因为我的 `t_error()` 函数。我也感觉我制作的标记可能有问题。请帮我解决这个问题,我知道这有点蠢,但我是新手,所以请对我友善一点。
顺便说一下,这是源代码:

```python
import ply.lex as lex
import ply.yacc as yacc

import sys

tokens = [
    "INT",
    "ID",
    "PLUS",
    "MINUS",
    "EOF",
]

t_INT = r"\d+"
t_ID = r"[a-zA-Z_][a-zA-Z0-9_]*"
t_PLUS = r"\+"
t_MINUS = r"-"
t_TIMES = r"\*"
t_DIVIDE = r"/"

def t_error(t):
    print(f"非法字符 '{t.lexer.lexeme}'", file=sys.stderr)

lexer = lex.lex()

def p_expression(p):
    """expression : INT
                 | ID
                 | expression PLUS expression
                 | expression MINUS expression
                 | expression TIMES expression
                 | expression DIVIDE expression"""
    if len(p) == 2:
        if isinstance(p[1], int):
            p[0] = p[1]
        elif isinstance(p[1], str):
            p[0] = p[1]
    else:
        if p[2] == "+":
            p[0] = p[1] + p[3]
        elif p[2] == "-":
            p[0] = p[1] - p[3]
        elif p[2] == "*":
            p[0] = p[1] * p[3]
        elif p[2] == "/":
            p[0] = p[1] / p[3]

parser = yacc.yacc()

def test(text):
    try:
        result = parser.parse(text)
        if result:
            print(result)
        else:
            print("空表达式")
    except yacc.YaccError:
        print("解析输入时出错")

if __name__ == "__main__":
    test("123")
    test("hello")
    test("123 + 456")
    test("123 - 456")
    test("123 * 456")
    test("123 / 456")

也许我只是愚蠢,但因此我无法让它运行。


<details>
<summary>英文:</summary>

I have tried to create a Lexer recently, and it doesn&#39;t work out well.

The problem is it&#39;s thrown an error message shows that &quot;Can&#39;t build lexer&quot;. Here&#39;s the traceback:

ERROR: Rule 't_TIMES' defined for an unspecified token TIMES
ERROR: Rule 't_DIVIDE' defined for an unspecified token DIVIDE
Traceback (most recent call last):
File "...\Lexer.py", line 24, in <module>
lexer = lex.lex()
^^^^^^^^^
File "...\lex.py", line 910, in lex
raise SyntaxError("Can't build lexer")
SyntaxError: Can't build lexer

I&#39;m aware that it&#39;s because of the `t_error()` function of mine. I also sense the token I&#39;ve made maybe having a problem. Please help me with that, I know that this is kind of dumb but I&#39;m new, so please be nice to me.
Btw, here&#39;s the source code

```python
import ply.lex as lex
import ply.yacc as yacc

import sys

tokens = [
    &quot;INT&quot;,
    &quot;ID&quot;,
    &quot;PLUS&quot;,
    &quot;MINUS&quot;,
    &quot;EOF&quot;,
]

t_INT = r&quot;\d+&quot;
t_ID = r&quot;[a-zA-Z_][a-zA-Z0-9_]*&quot;
t_PLUS = r&quot;+&quot;
t_MINUS = r&quot;-&quot;
t_TIMES = r&quot;*&quot;
t_DIVIDE = r&quot;/&quot;

def t_error(t):
    print(&quot;Illegal character &#39;%s&#39;&quot; % t.lexer.lexeme, file=sys.stderr)

lexer = lex.lex()

def p_expression(p):
    &quot;&quot;&quot;expression : INT
                 | ID
                 | expression PLUS expression
                 | expression MINUS expression
                 | expression TIMES expression
                 | expression DIVIDE expression&quot;&quot;&quot;
    if len(p) == 2:
        if isinstance(p[1], int):
            p[0] = p[1]
        elif isinstance(p[1], str):
            p[0] = p[1]
    else:
        if p[2] == &quot;+&quot;:
            p[0] = p[1] + p[3]
        elif p[2] == &quot;-&quot;:
            p[0] = p[1] - p[3]
        elif p[2] == &quot;*&quot;:
            p[0] = p[1] * p[3]
        elif p[2] == &quot;/&quot;:
            p[0] = p[1] / p[3]

parser = yacc.yacc()

def test(text):
    try:
        result = parser.parse(text)
        if result:
            print(result)
        else:
            print(&quot;Empty expression&quot;)
    except yacc.YaccError:
        print(&quot;Error parsing input&quot;)

if __name__ == &quot;__main__&quot;:
    test(&quot;123&quot;)
    test(&quot;hello&quot;)
    test(&quot;123 + 456&quot;)
    test(&quot;123 - 456&quot;)
    test(&quot;123 * 456&quot;)
    test(&quot;123 / 456&quot;)

Maybe I'm just stupid, but because of that so I cannot make it to run.

答案1

得分: 0

以下是您要求的内容的翻译:

这些错误...

错误: 未为未指定的令牌TIMES定义规则't_TIMES'
错误: 未为未指定的令牌DIVIDE定义规则't_DIVIDE'

...似乎很明显。您没有在您的tokens数组中定义名为TIMESDIVIDE的令牌。您需要:

tokens = [
    "INT",
    "ID",
    "PLUS",
    "MINUS",
    "EOF",
    "TIMES",
    "DIVIDE",
]

一旦您修复了这些错误,您将得到:

错误: 规则't_PLUS'的正则表达式无效。位置11处没有要重复的内容
错误: 规则't_TIMES'的正则表达式无效。位置12处没有要重复的内容

这是因为字符+*都是正则表达式通配符,因此如果要使用文字字符,您需要转义它们:

t_PLUS = r"\+"
t_TIMES = r"\*"

一旦您修复了那些错误,最终从您的t_error方法中会得到这个:

AttributeError: 'Lexer'对象没有属性'lexeme'。您是不是想说'lexre'?

似乎没有lexeme属性,但您可以使用t.value

def t_error(t):
    print("非法字符 '%s'" % t.value, file=sys.stderr)

修复了这个错误之后,您现在会得到:

123
hello
非法字符 ' + 456'
.
.
.
ply.lex.LexError: 扫描错误。非法字符 ' '

您的表达式中有空格,但您的规则中没有考虑到这一点。快速解决方法是在测试表达式中删除空格:

if __name__ == "__main__":
    test("123")
    test("hello")
    test("123+456")
    test("123-456")
    test("123*456")
    test("123/456")

修复了这个错误之后,您现在会得到:

123
hello
123456
.
.
.
TypeError: 不支持的操作数类型进行减法:'str'和'str'

这是因为您试图在p_expression方法中添加字符串值。在应用算术运算符之前,您需要将它们转换为数字。最简单的解决方法是将t_INT的定义替换为以下方法:

def t_INT(t):
    r'\d+'
    t.value = int(t.value)
    return t

现在运行代码会产生:

123
hello
579
-333
56088
0.26973684210526316
英文:

These errors...

ERROR: Rule &#39;t_TIMES&#39; defined for an unspecified token TIMES
ERROR: Rule &#39;t_DIVIDE&#39; defined for an unspecified token DIVIDE

...seem pretty clear. You haven't defined the tokens named TIMES or DIVIDE in your tokens array. You need:

tokens = [
    &quot;INT&quot;,
    &quot;ID&quot;,
    &quot;PLUS&quot;,
    &quot;MINUS&quot;,
    &quot;EOF&quot;,
    &quot;TIMES&quot;,
    &quot;DIVIDE&quot;,
]

Once you fix those errors, you will get:

ERROR: Invalid regular expression for rule &#39;t_PLUS&#39;. nothing to repeat at position 11
ERROR: Invalid regular expression for rule &#39;t_TIMES&#39;. nothing to repeat at position 12

That's because the characters + and * are both regex wildcards, so you need to escape them if you want the literal character:

t_PLUS = r&quot;\+&quot;
t_TIMES = r&quot;\*&quot;

Once you fix those errors, you'll ultimately get this from your t_error method:

AttributeError: &#39;Lexer&#39; object has no attribute &#39;lexeme&#39;. Did you mean: &#39;lexre&#39;?

There doesn't appear to be a lexeme attribute, but you can use t.value:

def t_error(t):
    print(&quot;Illegal character &#39;%s&#39;&quot; % t.value, file=sys.stderr)

Having fixed that error, you will now get:

123
hello
Illegal character &#39; + 456&#39;
.
.
.
ply.lex.LexError: Scanning error. Illegal character &#39; &#39;

You have spaces in your expressions, but you haven't accounted for this in your rules. The quick fix is to remove the spaces in your test expressions:

if __name__ == &quot;__main__&quot;:
    test(&quot;123&quot;)
    test(&quot;hello&quot;)
    test(&quot;123+456&quot;)
    test(&quot;123-456&quot;)
    test(&quot;123*456&quot;)
    test(&quot;123/456&quot;)

Having fixed that error, you'll now get:

123
hello
123456
.
.
.
TypeError: unsupported operand type(s) for -: &#39;str&#39; and &#39;str&#39;

And that's because you're trying to add string values in your p_expression method. You need to convert them to numbers before applying arithmetic operators. The easiest solution is to replace your definition of t_INT with this method:

def t_INT(t):
    r&#39;\d+&#39;
    t.value = int(t.value)
    return t

And now running the code produces:

123
hello
579
-333
56088
0.26973684210526316

huangapple
  • 本文由 发表于 2023年7月6日 22:30:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76629905.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定