2023年7月6日 22:30:07go评论78阅读模式

英文:

I'm having a problem with creating a lexer using PLY in Python

问题

我最近尝试创建了一个词法分析器，但它并不顺利。

问题在于它抛出了一个错误消息，显示“无法构建词法分析器”。以下是回溯信息：

错误: 规则 't_TIMES' 已为未指定的标记 TIMES 定义
错误: 规则 't_DIVIDE' 已为未指定的标记 DIVIDE 定义
回溯（最近调用的呼叫最后）：
文件“... / Lexer.py”，第 24 行，在中
lexer = lex.lex()
^^^^^^^^^
文件“... / lex.py”，第 910 行，在 lex 中
raise SyntaxError("无法构建词法分析器")
SyntaxError: 无法构建词法分析器

我意识到这是因为我的 `t_error()` 函数。我也感觉我制作的标记可能有问题。请帮我解决这个问题，我知道这有点蠢，但我是新手，所以请对我友善一点。
顺便说一下，这是源代码：

```python
import ply.lex as lex
import ply.yacc as yacc

import sys

tokens = [
    "INT",
    "ID",
    "PLUS",
    "MINUS",
    "EOF",
]

t_INT = r"\d+"
t_ID = r"[a-zA-Z_][a-zA-Z0-9_]*"
t_PLUS = r"\+"
t_MINUS = r"-"
t_TIMES = r"\*"
t_DIVIDE = r"/"

def t_error(t):
    print(f"非法字符 '{t.lexer.lexeme}'", file=sys.stderr)

lexer = lex.lex()

def p_expression(p):
    """expression : INT
                 | ID
                 | expression PLUS expression
                 | expression MINUS expression
                 | expression TIMES expression
                 | expression DIVIDE expression"""
    if len(p) == 2:
        if isinstance(p[1], int):
            p[0] = p[1]
        elif isinstance(p[1], str):
            p[0] = p[1]
    else:
        if p[2] == "+":
            p[0] = p[1] + p[3]
        elif p[2] == "-":
            p[0] = p[1] - p[3]
        elif p[2] == "*":
            p[0] = p[1] * p[3]
        elif p[2] == "/":
            p[0] = p[1] / p[3]

parser = yacc.yacc()

def test(text):
    try:
        result = parser.parse(text)
        if result:
            print(result)
        else:
            print("空表达式")
    except yacc.YaccError:
        print("解析输入时出错")

if __name__ == "__main__":
    test("123")
    test("hello")
    test("123 + 456")
    test("123 - 456")
    test("123 * 456")
    test("123 / 456")

也许我只是愚蠢，但因此我无法让它运行。


<details>
<summary>英文:</summary>

I have tried to create a Lexer recently, and it doesn&#39;t work out well.

The problem is it&#39;s thrown an error message shows that &quot;Can&#39;t build lexer&quot;. Here&#39;s the traceback:

ERROR: Rule 't_TIMES' defined for an unspecified token TIMES
ERROR: Rule 't_DIVIDE' defined for an unspecified token DIVIDE
Traceback (most recent call last):
File "...\Lexer.py", line 24, in <module>
lexer = lex.lex()
^^^^^^^^^
File "...\lex.py", line 910, in lex
raise SyntaxError("Can't build lexer")
SyntaxError: Can't build lexer

I&#39;m aware that it&#39;s because of the `t_error()` function of mine. I also sense the token I&#39;ve made maybe having a problem. Please help me with that, I know that this is kind of dumb but I&#39;m new, so please be nice to me.
Btw, here&#39;s the source code

```python
import ply.lex as lex
import ply.yacc as yacc

import sys

tokens = [
    &quot;INT&quot;,
    &quot;ID&quot;,
    &quot;PLUS&quot;,
    &quot;MINUS&quot;,
    &quot;EOF&quot;,
]

t_INT = r&quot;\d+&quot;
t_ID = r&quot;[a-zA-Z_][a-zA-Z0-9_]*&quot;
t_PLUS = r&quot;+&quot;
t_MINUS = r&quot;-&quot;
t_TIMES = r&quot;*&quot;
t_DIVIDE = r&quot;/&quot;

def t_error(t):
    print(&quot;Illegal character &#39;%s&#39;&quot; % t.lexer.lexeme, file=sys.stderr)

lexer = lex.lex()

def p_expression(p):
    &quot;&quot;&quot;expression : INT
                 | ID
                 | expression PLUS expression
                 | expression MINUS expression
                 | expression TIMES expression
                 | expression DIVIDE expression&quot;&quot;&quot;
    if len(p) == 2:
        if isinstance(p[1], int):
            p[0] = p[1]
        elif isinstance(p[1], str):
            p[0] = p[1]
    else:
        if p[2] == &quot;+&quot;:
            p[0] = p[1] + p[3]
        elif p[2] == &quot;-&quot;:
            p[0] = p[1] - p[3]
        elif p[2] == &quot;*&quot;:
            p[0] = p[1] * p[3]
        elif p[2] == &quot;/&quot;:
            p[0] = p[1] / p[3]

parser = yacc.yacc()

def test(text):
    try:
        result = parser.parse(text)
        if result:
            print(result)
        else:
            print(&quot;Empty expression&quot;)
    except yacc.YaccError:
        print(&quot;Error parsing input&quot;)

if __name__ == &quot;__main__&quot;:
    test(&quot;123&quot;)
    test(&quot;hello&quot;)
    test(&quot;123 + 456&quot;)
    test(&quot;123 - 456&quot;)
    test(&quot;123 * 456&quot;)
    test(&quot;123 / 456&quot;)

Maybe I'm just stupid, but because of that so I cannot make it to run.

答案1

得分: 0

以下是您要求的内容的翻译：

这些错误...

错误: 未为未指定的令牌TIMES定义规则't_TIMES'
错误: 未为未指定的令牌DIVIDE定义规则't_DIVIDE'

...似乎很明显。您没有在您的tokens数组中定义名为TIMES或DIVIDE的令牌。您需要：

tokens = [
    "INT",
    "ID",
    "PLUS",
    "MINUS",
    "EOF",
    "TIMES",
    "DIVIDE",
]

一旦您修复了这些错误，您将得到：

错误: 规则't_PLUS'的正则表达式无效。位置11处没有要重复的内容
错误: 规则't_TIMES'的正则表达式无效。位置12处没有要重复的内容

这是因为字符+和*都是正则表达式通配符，因此如果要使用文字字符，您需要转义它们：

t_PLUS = r"\+"
t_TIMES = r"\*"

一旦您修复了那些错误，最终从您的t_error方法中会得到这个：

AttributeError: 'Lexer'对象没有属性'lexeme'。您是不是想说'lexre'？

似乎没有lexeme属性，但您可以使用t.value：

def t_error(t):
    print("非法字符 '%s'" % t.value, file=sys.stderr)

修复了这个错误之后，您现在会得到：

123
hello
非法字符 ' + 456'
.
.
.
ply.lex.LexError: 扫描错误。非法字符 ' '

您的表达式中有空格，但您的规则中没有考虑到这一点。快速解决方法是在测试表达式中删除空格：

if __name__ == "__main__":
    test("123")
    test("hello")
    test("123+456")
    test("123-456")
    test("123*456")
    test("123/456")

修复了这个错误之后，您现在会得到：

123
hello
123456
.
.
.
TypeError: 不支持的操作数类型进行减法：'str'和'str'

这是因为您试图在p_expression方法中添加字符串值。在应用算术运算符之前，您需要将它们转换为数字。最简单的解决方法是将t_INT的定义替换为以下方法：

def t_INT(t):
    r'\d+'
    t.value = int(t.value)
    return t

现在运行代码会产生：

123
hello
579
-333
56088
0.26973684210526316

英文:

These errors...

ERROR: Rule &#39;t_TIMES&#39; defined for an unspecified token TIMES
ERROR: Rule &#39;t_DIVIDE&#39; defined for an unspecified token DIVIDE

...seem pretty clear. You haven't defined the tokens named TIMES or DIVIDE in your tokens array. You need:

tokens = [
    &quot;INT&quot;,
    &quot;ID&quot;,
    &quot;PLUS&quot;,
    &quot;MINUS&quot;,
    &quot;EOF&quot;,
    &quot;TIMES&quot;,
    &quot;DIVIDE&quot;,
]

Once you fix those errors, you will get:

ERROR: Invalid regular expression for rule &#39;t_PLUS&#39;. nothing to repeat at position 11
ERROR: Invalid regular expression for rule &#39;t_TIMES&#39;. nothing to repeat at position 12

That's because the characters + and * are both regex wildcards, so you need to escape them if you want the literal character:

t_PLUS = r&quot;\+&quot;
t_TIMES = r&quot;\*&quot;

Once you fix those errors, you'll ultimately get this from your t_error method:

AttributeError: &#39;Lexer&#39; object has no attribute &#39;lexeme&#39;. Did you mean: &#39;lexre&#39;?

There doesn't appear to be a lexeme attribute, but you can use t.value:

def t_error(t):
    print(&quot;Illegal character &#39;%s&#39;&quot; % t.value, file=sys.stderr)

Having fixed that error, you will now get:

123
hello
Illegal character &#39; + 456&#39;
.
.
.
ply.lex.LexError: Scanning error. Illegal character &#39; &#39;

You have spaces in your expressions, but you haven't accounted for this in your rules. The quick fix is to remove the spaces in your test expressions:

if __name__ == &quot;__main__&quot;:
    test(&quot;123&quot;)
    test(&quot;hello&quot;)
    test(&quot;123+456&quot;)
    test(&quot;123-456&quot;)
    test(&quot;123*456&quot;)
    test(&quot;123/456&quot;)

Having fixed that error, you'll now get:

123
hello
123456
.
.
.
TypeError: unsupported operand type(s) for -: &#39;str&#39; and &#39;str&#39;

And that's because you're trying to add string values in your p_expression method. You need to convert them to numbers before applying arithmetic operators. The easiest solution is to replace your definition of t_INT with this method:

def t_INT(t):
    r&#39;\d+&#39;
    t.value = int(t.value)
    return t

And now running the code produces:

123
hello
579
-333
56088
0.26973684210526316

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我遇到了在Python中使用PLY创建词法分析器的问题。

问题

答案1

根据列删除Pandas数据帧中的重复项。

如何使Python显示整个方程和结果，而不仅仅是结果？

如何更快地迭代这两个非常大的警报数据框？

如何在Python中加载数据集并处理它，而不会超出内存限制？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论