2023年1月6日 14:31:43go评论126阅读模式

英文:

Regex split string by two consecutive pipe ||

问题

我想通过两个竖线(|| )正则表达式来拆分下面的字符串。

输入字符串

value1=data1||value2=da|ta2||value3=test&amp;user01|

期望输出

value1=data1
value2=da|ta2
value3=test&amp;user01|

我尝试了 ([^||]+)，但它考虑了单个竖线 | 也要进行拆分。

尝试我的示例 - 正则表达式

value2 中有一个单竖线，不应被视为匹配。

我正在使用 Lua 脚本如下：

for pair in string.gmatch(params, "([^||]+)") do
 print(pair)
end

英文:

I want to split below string by two pipe(|| ) regex .

Input String

value1=data1||value2=da|ta2||value3=test&amp;user01|

Expected Output

value1=data1
value2=da|ta2
value3=test&amp;user01|

I tried ([^||]+) but its consider single pipe | also to split .

Try out my example - Regex

value2 has single pipe it should not be considered as matching.

I am using lua script like

for pair in string.gmatch(params, &quot;([^||]+)&quot;) do 
 print(pair) 
end

答案1

得分: 3

你可以明确地找到每个 ||。

$ cat foo.lua
s = 'value1=data1||value2=da|ta2||value3=test&amp;user01|'
offset = 1
for idx in string.gmatch(s, '()||') do
    print(string.sub(s, offset, idx - 1))
    offset = idx + 2
end
-- 处理右边最右边的 `||` 之后的部分。
-- 必须 +1，否则无法处理类似 &quot;a=b||&quot; 的 s。
if offset <= #s + 1 then
    print(string.sub(s, offset))
end
$ lua foo.lua
value1=data1
value2=da|ta2
value3=test&amp;user01|

关于 ()|| 请参考 Lua 的 Patterns 文档 ^{(Lua 不支持正则表达式)}：

捕获：

一个模式可以包含用括号括起来的子模式；它们描述了捕获。当匹配成功时，与捕获匹配的主题字符串的子字符串将被存储（捕获）以供将来使用。捕获按照它们左括号的顺序编号。例如，在模式 "(a*(.)%w(%s*))" 中，与字符串部分匹配 "(a*(.)%w(%s*))" 的部分被存储为第一个捕获，因此编号为 1；与字符匹配 "." 的捕获编号为 2，与匹配 ""%s*"" 的部分的捕获编号为 3。

作为特殊情况，捕获 () 捕获当前字符串位置（一个数字）。例如，如果我们将模式 ()"aa()" 应用于字符串 "flaaap"，将会有两个捕获：3 和 5。

英文:

You can explicitly find each ||.

$ cat foo.lua
s = &#39;value1=data1||value2=da|ta2||value3=test&amp;user01|&#39;
offset = 1
for idx in string.gmatch(s, &#39;()||&#39;) do
    print(string.sub(s, offset, idx - 1) )
    offset = idx + 2
end
-- Deal with the part after the right-most `||`.
-- Must +1 or it&#39;ll fail to handle s like &quot;a=b||&quot;.
if offset &lt;= #s + 1 then
    print(string.sub(s, offset) )
end
$ lua foo.lua
value1=data1
value2=da|ta2
value3=test&amp;user01|

Regarding ()|| see Lua's doc about Patterns <sup>(Lua does not have regex support)</sup> —

> * Captures:
>
> A pattern can contain sub-patterns enclosed in parentheses; they describe captures. When a match succeeds, the substrings of the subject string that match captures are stored (captured) for future use. Captures are numbered according to their left parentheses. For instance, in the pattern "(a*(.)%w(%s*))", the part of the string matching "a*(.)%w(%s*)" is stored as the first capture, and therefore has number 1; the character matching "." is captured with number 2, and the part matching "%s*" has number 3.
>
> As a special case, the capture () captures the current string position (a number). For instance, if we apply the pattern "()aa()" on the string "flaaap", there will be two captures: 3 and 5.

答案2

得分: 2

最简单的方法是将2个字符的序列||替换为任何其他字符（例如；），该字符不会在数据中使用，然后将其用作分隔符：

local params = "value1=data1||value2=da|ta2||value3=test&amp;user01|"
for pair in string.gmatch(params:gsub('||',';'), "([^;]+)") do 
 print(pair) 
end

如果所有字符都可能出现，那么可以根据它们的代码使用任何不可打印字符：string.char("10") == "\10" == "\n"，甚至可以使用代码1："\1"

string.gmatch( params:gsub('||',''), "([^]+)" )

英文:

the easiest way is to replace the sequence of 2 characters || with any other character (e.g. ;) that will not be used in the data, and only then use it as a separator:

local params = &quot;value1=data1||value2=da|ta2||value3=test&amp;user01|&quot;
for pair in string.gmatch(params:gsub(&#39;||&#39;,&#39;;&#39;), &quot;([^;]+)&quot;) do 
 print(pair) 
end

if all characters are possible, then any non-printable characters can be used, according to their codes: string.char("10") == "\10" == "\n"
even with code 1: "\1"

string.gmatch( params:gsub(&#39;||&#39;,&#39;&#39;), &quot;([^]+)&quot; )

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

通过两个连续的竖线 || 正则分割字符串

问题

答案1

答案2

使用stringr::str_remove_all如何移除一个字符串？

捕获最后匹配之后的句子部分

可以获得转义字符在Lua中由f:read()提供的字符串中仍然按原样工作吗？

以字母数字和连字符开头，后面是下划线的正则表达式。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。