在Rust中使用LALRPOP解析由双引号引起的字符串。

huangapple go评论116阅读模式
英文:

Parse a string quoted by " in rust using lalrpop

问题

如何使用lalrpop在Rust中解析由双引号引用的字符串?

Str: Vec<u8> = {
    "\"\"" <s:r"([^"])*"> "\"\"" => {
        s.bytes().collect()
    },
};

这并不起作用,我尝试在不同位置放置\,但由于文档中关于正则表达式的信息不多,我还没有弄清楚。

英文:

How to parse a string quoted by " in rust using lalrpop?

Str: Vec&lt;u8&gt; = {
    &quot;\&quot;&quot; &lt;s:r&quot;([^\&quot;])*&quot;&gt; &quot;\&quot;&quot; =&gt; {
        s.bytes().collect()
    },
};

This doesn't work and I tried putting \ at different places but haven't figured it out since there isn't much about regex in the docs.

答案1

得分: 2

Lalrpop的文档具体讨论了这个问题:

> 如果需要嵌入引号,可以使用哈希符号,例如 r#&quot;...&quot;...&quot;#

在Rust中,r&quot;...&quot;原始字符串,意味着所有非引号字符都被直接处理,包括 \。原始字符串没有转义字符,但当你在开头使用 r#&quot; 而不是 r&quot; 时,字符串不会结束,直到找到 &quot;#。你可以使用任意数量的 # 字符来实现这一点,所以下面这些都是相同的字符串。

&quot;hello&quot;
r&quot;hello&quot;
r#&quot;hello&quot;#
r##&quot;hello&quot;##
r###&quot;hello&quot;###

以此类推。这允许你用足够多的周围 # 符号来转义任何数量的 &quot;

对于这个(以及大多数情况),这有点过于复杂了。对于你的正则表达式,你只需要一组周围的 #,如文档所示,它看起来是这样的。

r#&quot;([^&quot;])*&quot;#

然而,如果你想要一个排除 &quot;# 的正则表达式,可以这样做:

r##&quot;([^&quot;#])*&quot;##

请注意,这样做是因为正则表达式通常使用大量的 \,这会让转义变得麻烦。你也可以使用带有转义引号的普通字符串(至少在Rust中是这样,我没有尝试在lalrpop中使用)。

&quot;([^\&quot;])*&quot;
英文:

Lalrpop's documentation talks about this specifically:

> you can use hashes if you need to embed quotes, like r#&quot;...&quot;...&quot;#.

In rust, r&quot;...&quot; is a raw string, meaning all non-quote characters are taken literally, including \. There is no escape character for raw strings, but when you put r#&quot; instead of r&quot; at the beginning, the string won't end until it finds &quot;#. You can do this with any number of # characters, so these are all the same string.

&quot;hello&quot;
r&quot;hello&quot;
r#&quot;hello&quot;#
r##&quot;hello&quot;##
r###&quot;hello&quot;###

And so on. This allows you to escape any &quot; followed by any number of # with sufficient surrounding # symbols.

All that is a little overkill for this (and most things). For your regex, you just need one set of surrounding #, as the documentation shows, which looks like this.

r#&quot;([^&quot;])*&quot;#

However, if you wanted a regex that excluded &quot; and #, you could do:

r##&quot;([^&quot;#])*&quot;##

Note that this is done because regexes often use lots of \, which would make escaping it annoying. You can also put a normal string with an escaped quote (at least in rust, I haven't tried it with lalrpop).

&quot;([^\&quot;])*&quot;

huangapple
  • 本文由 发表于 2023年5月26日 09:59:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76337209.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定