2020年1月7日 00:50:46go评论96阅读模式

英文:

Combining state and token throws. Why?

问题

这段代码的翻译如下：

这段代码有效

sub test-string( $string )
{
    my token opening-brace { \( };
    my token closing-brace { \) };
    my token balanced-braces { 
        ( &lt;opening-brace&gt;+ ) &lt;closing-brace&gt; ** { $0.chars } 
    };
    
    so $string ~~ /^ &lt;balanced-braces&gt; $/;
}

这段代码

sub test-string( $string )
{
    state token opening-brace { \( };
    state token closing-brace { \) };
    state token balanced-braces { 
        ( &lt;opening-brace&gt;+ ) &lt;closing-brace&gt; ** { $0.chars } 
    };
    
    so $string ~~ /^ &lt;balanced-braces&gt; $/;
}

出现错误

对于类型为 'Match' 的调用者，不存在名为 'opening-brace' 的方法
在 ch-2.p6 第 13 行的正则表达式 balanced-braces 中
在 ch-2.p6 第 17 行的子程序 test-string 中
在 ch-2.p6 第 23 行的块 <unit> 中

我更喜欢第二个版本，因为我认为第一个版本在每次调用函数时都需要设置 token，效率相对较低。所以，如果这是真实的代码而不是挑战性的示例，我会将这些 token（文件全局）设为全局。为什么会发生这种情况呢？

英文:

This works

sub test-string( $string )
{
    my token opening-brace { \( };
    my token closing-brace { \) };
    my token balanced-braces { 
        ( &lt;opening-brace&gt;+ ) &lt;closing-brace&gt; ** { $0.chars } 
    };
    
    so $string ~~ /^ &lt;balanced-braces&gt; $/;
}

This

sub test-string( $string )
{
    state token opening-brace { \( };
    state token closing-brace { \) };
    state token balanced-braces { 
        ( &lt;opening-brace&gt;+ ) &lt;closing-brace&gt; ** { $0.chars } 
    };
    
    so $string ~~ /^ &lt;balanced-braces&gt; $/;
}

dies with

No such method &#39;opening-brace&#39; for invocant of type &#39;Match&#39;
  in regex balanced-braces at ch-2.p6 line 13
  in sub test-string at ch-2.p6 line 17
  in block &lt;unit&gt; at ch-2.p6 line 23

I would prefer the second version, since I believe the first version is quite inefficient when it has to set up the tokens every time the function is called. So if this were real code and not a challenge entry, I'd have to make the tokens (file) global.

Why does this even happen?

答案1

得分: 6

Sure, here is the translation of the provided text:

TL;DR 我喜欢看 0。有一个变通方法（见 1），但我认为不值得。我不认为使用带有正则表达式/方法的 state 应该在编译时被拒绝（见 3 和 5）或者保持不变（见 4）。除非你是一个愿意说服 jnthn 让 Rakudo 大幅增加对 continuations 的曝光的编码天才（见 5）。

为什么会发生这种情况？（第 1 部分）

"这个" 如果你这样写就不会发生：

sub test-string( $string )
{
    state &amp;opening-brace = token { \( }
    state &amp;closing-brace = token { \) }
    state &amp;balanced-braces = token { 
        ( &lt;&amp;opening-brace&gt;+ ) &lt;&amp;closing-brace&gt; ** { $0.chars } 
    }

    so $string ~~ /^ &lt;&amp;balanced-braces&gt; $/;
}

（我在正则表达式中需要 & 有点让我惊讶。¹）

为什么会发生这种情况？（第 2 部分）

为什么会发生什么？

> 我认为第一个版本在每次调用函数时都要设置标记时效率低下。

你所说的"认为"、"效率低下"、"设置标记"是什么意思？我会期望正则表达式代码只会编译一次（如果每次都编译的话我会感到震惊），但我还没有进行性能分析来验证。

这引出了一系列问题：

你关心的是每次调用 test-string 函数时重新创建 3 个 lexpad 条目（如 &opening-parens 等）所需的时间吗？

你是否实际分析过运行原始代码并发现了重大问题？

你是否真正测量过这一点，发现它在实际项目中占了你的“关键 3%”的一部分？

为什么会发生这种情况？（第 3 部分）

state 声明符对于 sub 做了合理的事情 - 它会产生一个编译时错误：

state sub foo {}    # 编译时错误: "Cannot use 'state' with sub declaration"
state my sub foo {} # 编译时错误: "Type 'my' is not declared"

但是对于方法（正则表达式实际上就是方法），它编译了但没有做任何有用的事情：

state method foo {} # 编译通过，但我找不到访问 `foo` 的方法
state regex bar {.}  # 同样

我查看了 Rakudo 的 GH 问题队列，没有找到讨论类似上面代码的最后两行的问题（它们本质上与你的 token 案例相同）。也许人们没有注意到这一点，或者至少没有觉得有必要报告一个错误？

为什么会发生这种情况？（第 4 部分）

所以你会发布一个 Stack Overflow 帖子，说明 state regex 应该在编译时被拒绝或做一些有用的事情。然后 @Scimon++ 会记录另一种看待问题的方式。我也会做出一些贡献。

为什么会发生这种情况？（第 5 部分）

&lt;Your Compiler Code Goes Here&gt;

因为 Raku is our MMORPG。如果你希望在与例程声明一起使用 state 声明符时做一些有用的事情（可能应该像它目前在 sub 中那样产生编译时错误，或者在 Raku 构建之上的“scoped continuations” 约束内执行一些花哨的 continuation 操作），那么考虑到 Rakudo 编译器大部分是用 Raku 编写的，这个工作可能只需一点点努力。有人故意使 state 在方法或规则上成为编译时错误，而 continuation 概念将是一个真正巨大的项目，所以我认为在未来几年内，如果可能的话，最合适的事情是使方法或规则上的 state 也成为编译时错误。

或者，更合适的做法可能是，现在已经有一个 Stack Overflow 帖子详细说明了 state 的替代方法（一个语法）和解决方法（第 1 部分），现在是时候继续前进到下一个层次了...

脚注

¹ 请参阅我对 Difference in ... regex scope 的回答。使用 state 声明的正则表达式的行为似乎不符合我在那个答案中引用的设计猜测的直接解释。而且至少我在那个答案中的叙述中的以下部分也是错误的...

> "<bar> 如上所述。它优先解析为早期绑定的词法（my/our）例程/规则 &bar。

...因为在这个

英文:

TL;DR I like take 0. There's a workaround (see take 1) but I don't think it's worthwhile. I don't think it's inefficient with a plain my (see take 2). I think use of state with a regex/method should be rejected at compile time (see takes 3 and 5) or left as is (see take 4). Unless you're a coding genius willing to persuade jnthn that Rakudo should embark on a dramatic increased exposure to continuations (see take 5).

Why does this even happen? (take 1)

"This" doesn't if you write like so:

sub test-string( $string )
{
    state &amp;opening-brace = token { \( }
    state &amp;closing-brace = token { \) }
    state &amp;balanced-braces = token { 
        ( &lt;&amp;opening-brace&gt;+ ) &lt;&amp;closing-brace&gt; ** { $0.chars } 
    }

    so $string ~~ /^ &lt;&amp;balanced-braces&gt; $/;
}

(The need for the & in the regex calls slightly surprises me.<sup>1</sup>)

Why does this even happen? (take 2)

Why does what happen?

> I believe the first version is quite inefficient when it has to set up the tokens every time the function is called.

What do you mean by "believe" and "quite inefficient" and "set up the tokens"? I would expect the regex code to be compiled just once (I'd be shocked if it were compiled each time) but haven't profiled to verify.

Which leads me to a series of questions:

Is your concern purely the time taken to recreate the 3 lexpad entries (&opening-parens etc.; more generally, number of regexes) each time the test-string function is called?

Have you actually profiled running your original code and seen a significant problem?

Have you truly measured this and found it to be part of your "critical 3%" in an actual project?

Why does this even happen? (take 3)

The state declarator does a reasonable thing with subs -- it produces a compile-time error:

state sub foo {}    # Compile time error: &quot;Cannot use &#39;state&#39; with sub declaration&quot;
state my sub foo {} # Compile time error: &quot;Type &#39;my&#39; is not declared&quot;

But with a method (which is what a regex is under the covers) it compiles but does nothing useful:

state method foo {} # Compiles, but I failed to find a way to access `foo`
state regex bar {.}  # Same

I've looked in Rakudo's GH issues queue and failed to find an issue discussing anything like the last two lines of code above (which are essentially the same as your token case). Perhaps folk haven't noticed this or at least didn't feel it would be helpful to file a bug?

Why does this even happen? (take 4)

So you would post an SO documenting that state regex should be rejected at compile-time or do something useful. And @Scimon++ would document another way to look at things. And me some more.

Why does this even happen? (take 5)

&lt;Your Compiler Code Goes Here&gt;

Because Raku is our MMORPG. If you would prefer to see the state declarator do something useful when used with a routine declaration (presumably it should either produce a compile-time error, like it currently does with a sub, or do some fancy continuation thing within the constraint of the "scoped continuations" atop which Raku is built), then that work is plausibly just a "smop" away given that the Rakudo compiler is mostly written in Raku. Someone has deliberately made state on a sub a compile-time error, and the continuation notion would be a truly colossal project, so I think the appropriate thing, if any, in the next few years, would be to make state on a method or rule also a compile-time error.

Or, perhaps more appropriately still, now this is covered by an SO, with a documented alternative (a grammar) and workaround (take 1), it's just time to move on to the next level...

Footnotes

<sup>1</sup> See my answer to Difference in ... regex scope. The behavior of the regexes declared with state appear not to be following a straight-forward reading of the design speculation I quoted in that answer. And at least the following bit of my narrative from that answer is wrong too...

> "<bar> is as explained above. It preferentially resolves to an early bound lexical (my/our) routine/rule named &bar.

...because in the take 1 code of this answer the regex calls have to be prefixed with an & to work. Maybe it's pure accident they work at all.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Combining state and token throws. Why?

问题

答案1

为什么会发生这种情况？（第 1 部分）

为什么会发生这种情况？（第 2 部分）

为什么会发生这种情况？（第 3 部分）

为什么会发生这种情况？（第 4 部分）

为什么会发生这种情况？（第 5 部分）

脚注

Why does this even happen? (take 1)

Why does this even happen? (take 2)

Why does this even happen? (take 3)

Why does this even happen? (take 4)

Why does this even happen? (take 5)

Footnotes

建议处理混合了数值和分类特征的自然语言处理（NLP）数据的最佳方法：

`too short multibyte code string in regex` 是什么意思？

整个单词的正则表达式匹配和在JavaScript中创建超链接。

正则表达式：获取无限嵌套括号中的括号

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论