2023年4月17日 08:11:47go评论100阅读模式

英文:

Parsing Lisp-Like String into Tokens and Literal Text

问题

使用PHP v8的preg_match_all($Matches)函数的"Match"参数，我需要匹配一系列文字以及有定界符的标记。

$x = preg_match_all($Regex, $Template, $Matches, PREG_OFFSET_CAPTURE); // 解析模板。

但是有一个问题，标记应该能够嵌套。我需要仅匹配来自嵌套的 最外层标记。

例子：

This {is {{Par}m1}} plus {{Par{m3a{{Parm3b}}}} a}nd {{Parm4a||{{Par}m4b||{{Parm4c||{{Parm4d||Parm}}}}}}}}.

应该解析为：

Match 1: This {is
Match 2: {{Par}m1}}
Match 3:  plus
Match 4: {{Par{m3a{{Parm3b}}}}
Match 5:  a}nd
Match 6: {{Parm4a||{{Par}m4b||{{Parm4c||{{Parm4d||Parm}}}}}}}}
Match 7: .

请注意，上面只有双大括号 应该允许 在标记或文本中。

只有双大括号被视为标记定界符。

迄今为止，我的正则表达式仅在文本或标记中没有单大括号时才能正常工作。

我的正则表达式：

(?:(?!(\{\{)).)+|((\{\{)((?&gt;[^{}]+|(?2))*)(\}\}))

我无法弄清楚如何允许文本或标记中的单大括号而不破坏匹配列表。

更新

我正在继续解决这个问题，并想到了这个：

\{\{(?R)*\}\}|[^{}]+

它使用了递归运算符，但仍然受到相同问题的困扰，即单大括号会破坏解析。

正确的分隔符应该是开放和关闭的双大括号 "{{" 和 "}}"。

英文:

Using the PHP v8 preg_match_all($Matches) function's "Match" parameter, I need to match a list of literal text and delimited tokens.

$x = preg_match_all($Regex, $Template, $Matches, PREG_OFFSET_CAPTURE); // Parse the template.

The catch is that tokens should be able to be nested. I need to match only the outermost token from the nest.

Example:

This {is {{Par}m1}} plus {{Par{m3a{{Parm3b}}}} a}nd {{Parm4a||{{Par}m4b||{{Parm4c||{{Parm4d||Parm}}}}}}}}.

Should parse into this:

 Match 1: This {is
 Match 2: {{Par}m1}}
 Match 3:  plus
 Match 4: {{Par{m3a{{Parm3b}}}}
 Match 5:  a}nd
 Match 6: {{Parm4a||{{Par}m4b||{{Parm4c||{{Parm4d||Parm}}}}}}}}
 Match 7: .

Notice above that single curly braces should be allowed in tokens or in text.

Only double curly braces are considered token delimiters.

The regular expression that I have so far is working only if there are no single curly braces in the text or tokens.

My regex:

(?:(?!(\{\{)).)+|((\{\{)((?&gt;[^{}]+|(?2))*)(\}\}))

I cannot figure out how to allow single curly braces in the text or inside tokens without breaking the list of matches.

Any help greatly appreciated!

UPDATE

I am continuing to work on this problem and came up with this:

\{\{(?R)*\}\}|[^{}]+

It uses the recursion operator but it still suffers from the same issue in that single curly braces break the parsing.

The proper delimiter is intended to be opening and closing double-curly-braces "{{" and "}}".

答案1

得分: 1

以下是翻译好的内容：

我认为我找到了解决方案。到目前为止，测试似乎正在工作。

正则表达式是

({{)(?R)*(}})|(?:(?!{{|}}).)+

测试

解析这个：

{{one}}{}这是 {{Pa}rm1}} p{}lus {{P{ar{}m2}} 和2 {{Close1}}{{Close2}} {{Par{m3a{{Parm3}b}}}} 和 {{Par{m4a||{{Parm4b||{{Parm4c||{{Parm4d||Pa}rm}}}}}}}} 结束 {{Par{}m5}}。

产生了这个：

{{one}}
{}这是
{{Pa}rm1}}
p{}lus
{{P{ar{}m2}}
和2
{{Close1}}
{{Close2}}

{{Par{m3a{{Parm3}b}}}}
和
{{Par{m4a||{{Parm4b||{{Parm4c||{{Parm4d||Pa}rm}}}}}}}}
结束
{{Par{}m5}}
。

到目前为止似乎正在工作。

英文:

I think I found the solution. So far testing appears to be working.

The regex is

(\{\{)(?R)*(\}\})|(?:(?!\{\{|\}\}).)+

Testing

Parsing this:

{{one}}{}This is {{Pa}rm1}} p{}lus {{P{ar{}m2}} and2 {{Close1}}{{Close2}} {{Par{m3a{{Parm3}b}}}} and {{Par{m4a||{{Parm4b||{{Parm4c||{{Parm4d||Pa}rm}}}}}}}} end {{Par{}m5}}.

Yields this:

{{one}}
{}This is 
{{Pa}rm1}}
 p{}lus 
{{P{ar{}m2}}
 and2 
{{Close1}}
{{Close2}}
 
{{Par{m3a{{Parm3}b}}}}
 and 
{{Par{m4a||{{Parm4b||{{Parm4c||{{Parm4d||Pa}rm}}}}}}}}
 end 
{{Par{}m5}}
.

So far seems to be working.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

解析类似Lisp的字符串为标记和文字。

问题

答案1

如何动态获取原始图片

在字符串的第一个字母前添加点号如何做？

改变 Laravel 邮件配置在运行时

如何更改WordPress菜单结构（ul > li到nav > a）

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。