2023年2月16日 05:59:48go评论62阅读模式

英文:

Using makeExprParser with ambiguity

问题

我正在尝试将一个基于 CFG 的工具（antlr）中的解析器转换到 Megaparsec 时遇到了问题。

语法包含了用尖括号（<，>）括起来并用逗号分隔的表达式列表（使用 makeExprParser 处理）。

像 <>、<23>、<23,87> 等等。

问题在于这些表达式本身可能包含 > 运算符（表示“大于”），这会导致我的解析器失败。
例如，<1223>234> 应该解析为 [BinaryExpression ">>" (IntExpr 1223) (IntExpr 234)]。

我认为我需要在某个地方巧妙地使用 try，但我尝试过的地方（sepBy 的第一个参数和 makeExprParser 的第一个参数）不幸没有起作用。

在这种情况下，我是否可以在 Megaparsec 中使用 makeExprParser，还是我必须手动编写表达式解析器？

这是我的解析器的相关部分：

-- 使用 megaparsec、text 和 parser-combinators
{-# LANGUAGE OverloadedStrings #-}

module Main where

import Control.Monad.Combinators.Expr
import Data.Text
import Data.Void
import System.Environment
import Text.Megaparsec
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L

type BinaryOperator = Text

type Name = Text

data Expr
  = IntExpr Integer
  | BinaryExpression BinaryOperator Expr Expr
  deriving (Eq, Show)

type Parser = Parsec Void Text

lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc

symbol :: Text -> Parser Text
symbol = L.symbol sc

sc :: Parser ()
sc = L.space space1 (L.skipLineComment "//") (L.skipBlockCommentNested "/*" "*/")

parseInteger :: Parser Expr
parseInteger = do
  number <- some digitChar
  _ <- sc
  return $ IntExpr $ read number

parseExpr :: Parser Expr
parseExpr = makeExprParser parseInteger [[InfixL (BinaryExpression ">&gt;" <$ symbol ">&gt;")]]

parseBracketList :: Parser [Expr]
parseBracketList = do
  _ <- symbol "&lt;"
  exprs <- sepBy parseExpr (symbol ",")
  _ <- symbol "&gt;"
  return exprs

main :: IO ()
main = do
  text : _ <- getArgs
  let res = runParser parseBracketList "stdin" (pack text)
  case res of
    (Right suc) -> do
      print suc
    (Left err) ->
      putStrLn $ errorBundlePretty err

希望这对你有所帮助。

英文:

I'm currently encountering a problem while translating a parser from a CFG-based tool (antlr) to Megaparsec.

The grammar contains lists of expressions (handled with makeExprParser) that are enclosed in brackets (<, >) and separated by ,.

Stuff like <>, <23>, <23,87> etc.

The problem now is that the expressions may themselves contain the > operator (meaning "greater than"), which causes my parser to fail.
<1223>234> should, for example, be parsed into [BinaryExpression ">" (IntExpr 1223) (IntExpr 234)].

I presume that I have to strategically place try somewhere, but the places I tried (to the first argument of sepBy and the first argument of makeExprParser) did unfortunately not work.

Can I use makeExprParser in such a situation or do I have to manually write the expression parser?:

This is the relevant part of my parser:

-- uses megaparsec, text, and parser-combinators
{-# LANGUAGE OverloadedStrings #-}

module Main where

import Control.Monad.Combinators.Expr
import Data.Text
import Data.Void
import System.Environment
import Text.Megaparsec
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L

type BinaryOperator = Text

type Name = Text

data Expr
  = IntExpr Integer
  | BinaryExpression BinaryOperator Expr Expr
  deriving (Eq, Show)

type Parser = Parsec Void Text

lexeme :: Parser a -&gt; Parser a
lexeme = L.lexeme sc

symbol :: Text -&gt; Parser Text
symbol = L.symbol sc

sc :: Parser ()
sc = L.space space1 (L.skipLineComment &quot;//&quot;) (L.skipBlockCommentNested &quot;/*&quot; &quot;*/&quot;)

parseInteger :: Parser Expr
parseInteger = do
  number &lt;- some digitChar
  _ &lt;- sc
  return $ IntExpr $ read number

parseExpr :: Parser Expr
parseExpr = makeExprParser parseInteger [[InfixL (BinaryExpression &quot;&gt;&quot; &lt;$ symbol &quot;&gt;&quot;)]]

parseBracketList :: Parser [Expr]
parseBracketList = do
  _ &lt;- symbol &quot;&lt;&quot;
  exprs &lt;- sepBy parseExpr (symbol &quot;,&quot;)
  _ &lt;- symbol &quot;&gt;&quot;
  return exprs

main :: IO ()
main = do
  text : _ &lt;- getArgs
  let res = runParser parseBracketList &quot;stdin&quot; (pack text)
  case res of
    (Right suc) -&gt; do
      print suc
    (Left err) -&gt;
      putStrLn $ errorBundlePretty err

答案1

得分: 2

您可能误诊了问题。您的解析器在<1233>234>上失败，因为它试图将>解析为左结合运算符，就像+一样。换句话说，就像：

1+2+

会失败，因为第二个+没有右操作数，您的解析器之所以失败，是因为：

1233&gt;234&gt;

在第二个>之后没有数字。假设您不希望您的>运算符链接（即1>2>3不是有效的Expr），您应该首先在makeExprParser表中将InfixL替换为InfixN（非关联）。然后，它将可以成功解析此示例。

不幸的是，无论是否进行此更改，您的解析器仍然会在更简单的测试案例上失败：

&lt;1233&gt;

因为>被解释为继续表达式内的运算符。

换句话说，问题不是您的解析器无法处理带有>字符的表达式，而是它在处理>字符时过于激进，阻止它们被识别为结束的角括号。

要修复这个问题，您需要准确地弄清楚您正在解析的内容。具体来说，您需要通过精确描述>何时可以作为继续表达式的一部分以及何时不能来解决解析器中的歧义。

可能有效的一个规则是，只有在其后跟一个有效的“term”（即parseInteger）时，才将>视为运算符。您可以使用lookAhead来实现这一点。以下解析器：

symbol ">"

只会在后面跟一个有效的term时解析>运算符。如果未找到term，它将消耗一些输入（至少是>符号本身），因此您必须将其包装在try中：

try (symbol ">"

将上述两个修复应用于parseExpr后：

parseExpr :: Parser Expr
parseExpr = makeExprParser term
[[InfixN (BinaryExpression ">" <$ try (symbol ">" <* lookAhead term))]]
where term = parseInteger

您将获得以下解析结果：

λ> parseTest parseBracketList "<23>"
[IntExpr 23]
λ> parseTest parseBracketList "<23,87>"
[IntExpr 23,IntExpr 87]
λ> parseTest parseBracketList "<23,87>18>"
[IntExpr 23,BinaryExpression ">" (IntExpr 87) (IntExpr 18)]

但以下会失败：

λ> parseTest parseBracketList "<23,87>18"
1:10:
|
1 | <23,87>18
|          ^
unexpected end of input
expecting ',', '>', or digit
λ>

因为>后跟18意味着它是有效的运算符，它的失败解析是因为有效表达式87>18之后既不跟逗号也不跟结束的>角括号。

如果您需要解析类似<23,87>18的内容，那么您可能会遇到更大的问题。考虑以下两个测试案例：

<1,2>3,4,5,6,7,...,100000000000,100000000001>
<1,2>3,4,5,6,7,...,100000000000,100000000001

编写一个有效的解析器，可以将第一个解析为10000000000个表达式的列表，但将第二个解析为两个表达式的列表：

[IntExpr 1, IntExpr 2]

然后是一些“额外”文本。希望您要解析的底层“语言”不会如此严重地破裂，以至于这将成为一个问题。

英文:

You've (probably) misdiagnosed the problem. Your parser fails on <1233>234> because it's trying to parse > as a left associative operator, like +. In other words, the same way:

1+2+

would fail, because the second + has no right-hand operand, your parser is failing because:

1233&gt;234&gt;

has no digit following the second >. Assuming you don't want your > operator to chain (i.e., 1>2>3 is not a valid Expr), you should first replace InfixL with InfixN (non-associative) in your makeExprParser table. Then, it will parse this example fine.

Unfortunately, with or without this change your parser will still fail on the simpler test case:

&lt;1233&gt;

because the > is interpreted as an operator within a continuing expression.

In other words, the problem isn't that your parser can't handle expressions with > characters, it's that it's overly aggressive in treating > characters as part of an expression, preventing them from being recognized as the closing angle bracket.

To fix this, you need to figure out exactly what you're parsing. Specifically, you need to resolve the ambiguity in your parser by precisely characterizing the situations where > can be part of a continuing expression and where it can't.

One rule that will probably work is to only consider a > as an operator if it is followed by a valid "term" (i.e., a parseInteger). You can do this with lookAhead. The parser:

symbol &quot;&gt;&quot; &lt;* lookAhead term

will parse a > operator only if it is followed by a valid term. If it fails to find a term, it will consume some input (at least the > symbol itself), so you must surround it with a try:

try (symbol &quot;&gt;&quot; &lt;* lookAhead term)

With the above two fixes applied to parseExpr:

parseExpr :: Parser Expr
parseExpr = makeExprParser term
[[InfixN (BinaryExpression &quot;&gt;&quot; &lt;$ try (symbol &quot;&gt;&quot; &lt;* lookAhead term))]]
where term = parseInteger

you'll get the following parses:

λ&gt; parseTest parseBracketList &quot;&lt;23&gt;&quot;
[IntExpr 23]
λ&gt; parseTest parseBracketList &quot;&lt;23,87&gt;&quot;
[IntExpr 23,IntExpr 87]
λ&gt; parseTest parseBracketList &quot;&lt;23,87&gt;18&gt;&quot;
[IntExpr 23,BinaryExpression &quot;&gt;&quot; (IntExpr 87) (IntExpr 18)]

However, the following will fail:

λ&gt; parseTest parseBracketList &quot;&lt;23,87&gt;18&quot;
1:10:
|
1 | &lt;23,87&gt;18
|          ^
unexpected end of input
expecting &#39;,&#39;, &#39;&gt;&#39;, or digit
λ&gt;

because the fact that the > is followed by 18 means that it is a valid operator, and it is parse failure that the valid expression 87>18 is followed by neither a comma nor a closing > angle bracket.

If you need to parse something like <23,87>18, you have bigger problems. Consider the following two test cases:

&lt;1,2&gt;3,4,5,6,7,...,100000000000,100000000001&gt;
&lt;1,2&gt;3,4,5,6,7,...,100000000000,100000000001

It's a challenge to write an efficient parser that will parse the first one as a list of 10000000000 expressions but the second one as a list of two expression:

[IntExpr 1, IntExpr 2]

followed by some "extra" text. Hopefully, the underlying "language" you're trying to parse isn't so hopelessly broken that this will be an issue.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用makeExprParser处理歧义

问题

答案1

Why does my Antlr give error: rule prog contains a closure with at least one alternative that can match an empty string to my grammar

如何使用GHC API动态编译并导入一个模块 – LTS Haskell 20.20（ghc-9.2.7）？

如何将ThreadId转换为(LWP) PID？

Go中的上下文无关文法（CFG）解析器

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论