安全转义 Raku 正则表达式元字符

huangapple go评论63阅读模式
英文:

Safely escaping Raku regex metacharacters

问题

我想将一个类似于glob的模式转换为Raku正则表达式。以下是我目前的做法:

s :global {
    || $<question-mark> = '?'
    || $<asterisk>      = '*'
    || $<non-word>      = \W
} = $<question-mark> ?? '.' !! $<asterisk> ?? '.*' !! "\$<non-word>";

在这种方式中,是否正确在每个非单词字符前加上反斜杠?也就是说,这样做会漏掉任何应该转义的内容,或者转义任何不应该转义的内容吗?

我有点困惑Raku取消了Perl 5的 quotemeta 函数,这在这里将是理想的选择。虽然不需要像这个问题的答案中提到的那样经常使用它,但在这种情况下,我不确定自己手工编写的解决方案是否足够。

英文:

I want to convert a glob-style pattern into a Raku regex. This is how I'm doing it now:

s :global {
    || $&lt;question-mark&gt; = &#39;?&#39;
    || $&lt;asterisk&gt;      = &#39;*&#39;
    || $&lt;non-word&gt;      = \W
} = $&lt;question-mark&gt; ?? &#39;.&#39; !! $&lt;asterisk&gt; ?? &#39;.*&#39; !! &quot;\$&lt;non-word&gt;&quot;;

Is it correct to prefix every non-word character with a backslash in this way? That is, will this miss escaping anything that should be, or escape anything that shouldn't be?

I'm a little baffled that Raku did away with Perl 5's quotemeta function, which would be ideal here. It wouldn't be needed nearly as often, as noted in the answers to this question, but in a situation like this I'm left to hand-roll a solution that I'm not sure is adequate.

答案1

得分: 11

Raku正则表达式可以包含带引号的字符串文字:

say "food" ~~ /. "oo" /; # 「foo」

通过调用.raku,可以将Str转换为Raku源代码表示:

say "oh\n\"".raku; # "oh\n\""

这会处理所需的字符串结构转义,这意味着可以安全地插入到正则表达式中。

另外,虽然它仍然处于实验阶段,即将推出的RakuAST将允许通过构建AST来构建正则表达式,这将提供另一种安全且更通用的解决方案。

英文:

Raku regexes can contain quoted string literals:

say &quot;food&quot; ~~ /. &quot;oo&quot; /; # 「foo」

One can take a Str and turn it into a Raku source representation by calling .raku:

say &quot;oh\n\&quot;&quot;.raku; # &quot;oh\n\&quot;&quot;

This handles escaping of the string construct as required, meaning it is then safe to emit into the regex.

As an aside, while it's still experimental, the upcoming RakuAST will allow for constructing regexes by building up an AST, which will provide another safe and more general solution.

答案2

得分: 2

**使用我的版本的Rakudo(`v2022.07`),以下的转义包装有效:**

1. 取文本并用 `q[`…`]` 包装,
2. 取上述的 `q[…]` 并用 `&lt;{`…`}&gt;` 包装。

_在`zsh`或`bash`命令行上作为一行代码进行测试:_
```perl
~$ zsh
~% raku -e 'say "food" ~~ / . &lt;{ q[oo] }&gt; /;'
「foo」

~% bash
~$ raku -e 'say "food" ~~ / . &lt;{ q[oo] }&gt; /;'
「foo」

可以尝试Raku的“Q语言”的变体:我使用方括号也取得了成功。参见:https://docs.raku.org/language/quoting.html。请注意,确保添加&lt; &gt;尖括号,否则用大括号{ }包装的文本会看起来是不可见的(它会被执行为代码块):

~$ zsh
~% raku -e 'say "food" ~~ / { q[food] } /;'
「」
~% raku -e 'say "nothing" ~~ / { q[nothing] } /;'
「」

~% bash
~$ raku -e 'say "food" ~~ / { q[food] } /;'
「」
~$ raku -e 'say "nothing" ~~ / { q[nothing] } /;'
「」

上述可能对于跨平台的正则表达式最有用,而不是在Linux/Unix中将“外部单引号和内部双引号”更换为Windows中的“外部双引号和内部单引号”,反之亦然。您甚至可以尝试使用 qb[…] 以获得反斜杠转义的识别(例如在处理 \n 换行符时很有用):

~$ zsh
~% raku -e 'say "food\ntruck" ~~ / . &lt;{qb[ ood \\n tru ]}&gt; .. /;'
food
truck

~% bash
~$ raku -e 'say "food\ntruck" ~~ / . &lt;{qb[ ood \\n tru ]}&gt; .. /;'
food
truck

感谢 @fecundf 开始了我们中许多人理解/编码正则表达式匹配中插值的讨论(随时查看下面的线程)。

https://www.nntp.perl.org/group/perl.perl6.users/2019/09/msg6960.html


<details>
<summary>英文:</summary>

**With my version of Rakudo (`v2022.07`), the following escape-wrapping works:**

1. take the literal and wrap in `q[`…`]`,
2. take the `q[…]` above and wrap in `&lt;{`…`}&gt;`.

_Tested as a one-liner at either the `zsh` or `bash` command line:_
```perl
~$ zsh
~% raku -e &#39;say &quot;food&quot; ~~ / . &lt;{ q[oo] }&gt; /;&#39;
「foo」

~% bash
~$ raku -e &#39;say &quot;food&quot; ~~ / . &lt;{ q[oo] }&gt; /;&#39;
「foo」

Variations of Raku's "Q-language" can be tried: I've had success with square brackets as above. See: https://docs.raku.org/language/quoting.html . Note, make sure you add the &lt; &gt; angle brackets, otherwise the literal wrapped in { } curlies will appear invisible (it gets executed as a codeblock):

~$ zsh
~% raku -e &#39;say &quot;food&quot; ~~ / { q[food] } /;&#39;
「」
~% raku -e &#39;say &quot;nothing&quot; ~~ / { q[nothing] } /;&#39;
「」

~% bash
~$ raku -e &#39;say &quot;food&quot; ~~ / { q[food] } /;&#39;
「」
~$ raku -e &#39;say &quot;nothing&quot; ~~ / { q[nothing] } /;&#39;
「」

Above might be most useful for cross-platform Regexes, rather that swapping Linux/Unix "external-single-and-internal-double-quotes" for Windows "external-double-and-internal-single-quotes", and vice-versa. You can even try using qb[…] to get backslash-escape recognition (e.g. useful for problematic \n newline recognition):

~$ zsh
~% raku -e &#39;say &quot;food\ntruck&quot; ~~ / . &lt;{qb[ ood \\n tru ]}&gt; .. /;&#39;
food
truck

~% bash
~$ raku -e &#39;say &quot;food\ntruck&quot; ~~ / . &lt;{qb[ ood \\n tru ]}&gt; .. /;&#39;
food
truck

Credit to @fecundf for starting many of us on the topic of understanding/codifying interpolation within a regex matcher (feel free to peruse the thread below).

https://www.nntp.perl.org/group/perl.perl6.users/2019/09/msg6960.html

huangapple
  • 本文由 发表于 2023年6月14日 23:38:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76475291.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定