通过两个连续的竖线 || 正则分割字符串

huangapple go评论83阅读模式
英文:

Regex split string by two consecutive pipe ||

问题

我想通过两个竖线(|| )正则表达式来拆分下面的字符串。

输入字符串

value1=data1||value2=da|ta2||value3=test&user01|

期望输出

value1=data1
value2=da|ta2
value3=test&user01| 

我尝试了 ([^||]+),但它考虑了单个竖线 | 也要进行拆分。

尝试我的示例 - 正则表达式

value2 中有一个单竖线,不应被视为匹配。

我正在使用 Lua 脚本如下:

for pair in string.gmatch(params, "([^||]+)") do
 print(pair)
end
英文:

I want to split below string by two pipe(|| ) regex .

Input String

value1=data1||value2=da|ta2||value3=test&user01|

Expected Output

value1=data1
value2=da|ta2
value3=test&user01|

I tried ([^||]+) but its consider single pipe | also to split .

Try out my example - Regex

value2 has single pipe it should not be considered as matching.

I am using lua script like

for pair in string.gmatch(params, "([^||]+)") do 
 print(pair) 
end

答案1

得分: 3

你可以明确地找到每个 ||

$ cat foo.lua
s = 'value1=data1||value2=da|ta2||value3=test&user01|'

offset = 1
for idx in string.gmatch(s, '()||') do
    print(string.sub(s, offset, idx - 1))
    offset = idx + 2
end
-- 处理右边最右边的 `||` 之后的部分。
-- 必须 +1,否则无法处理类似 "a=b||" 的 s。
if offset <= #s + 1 then
    print(string.sub(s, offset))
end
$ lua foo.lua
value1=data1
value2=da|ta2
value3=test&amp;user01|

关于 ()|| 请参考 Lua 的 Patterns 文档 (Lua 不支持正则表达式)

  • 捕获:

    一个模式可以包含用括号括起来的子模式;它们描述了捕获。当匹配成功时,与捕获匹配的主题字符串的子字符串将被存储(捕获)以供将来使用。捕获按照它们左括号的顺序编号。例如,在模式 "(a*(.)%w(%s*))" 中,与字符串部分匹配 "(a*(.)%w(%s*))" 的部分被存储为第一个捕获,因此编号为 1;与字符匹配 "." 的捕获编号为 2,与匹配 ""%s*"" 的部分的捕获编号为 3。

    作为特殊情况,捕获 () 捕获当前字符串位置(一个数字)。例如,如果我们将模式 ()"aa()" 应用于字符串 "flaaap",将会有两个捕获:3 和 5。

英文:

You can explicitly find each ||.

$ cat foo.lua
s = &#39;value1=data1||value2=da|ta2||value3=test&amp;user01|&#39;

offset = 1
for idx in string.gmatch(s, &#39;()||&#39;) do
    print(string.sub(s, offset, idx - 1) )
    offset = idx + 2
end
-- Deal with the part after the right-most `||`.
-- Must +1 or it&#39;ll fail to handle s like &quot;a=b||&quot;.
if offset &lt;= #s + 1 then
    print(string.sub(s, offset) )
end
$ lua foo.lua
value1=data1
value2=da|ta2
value3=test&amp;user01|

Regarding ()|| see Lua's doc about Patterns <sup>(Lua does not have regex support)</sup> &mdash;

> * Captures:
>
> A pattern can contain sub-patterns enclosed in parentheses; they describe captures. When a match succeeds, the substrings of the subject string that match captures are stored (captured) for future use. Captures are numbered according to their left parentheses. For instance, in the pattern &quot;(a*(.)%w(%s*))&quot;, the part of the string matching &quot;a*(.)%w(%s*)&quot; is stored as the first capture, and therefore has number 1; the character matching &quot;.&quot; is captured with number 2, and the part matching &quot;%s*&quot; has number 3.
>
> As a special case, the capture () captures the current string position (a number). For instance, if we apply the pattern &quot;()aa()&quot; on the string &quot;flaaap&quot;, there will be two captures: 3 and 5.

答案2

得分: 2

最简单的方法是将2个字符的序列||替换为任何其他字符(例如),该字符不会在数据中使用,然后将其用作分隔符:

local params = "value1=data1||value2=da|ta2||value3=test&amp;user01|"

for pair in string.gmatch(params:gsub('||',';'), "([^;]+)") do 
 print(pair) 
end

如果所有字符都可能出现,那么可以根据它们的代码使用任何不可打印字符:string.char("10") == "\10" == "\n",甚至可以使用代码1:"\1"

string.gmatch( params:gsub('||',''), "([^]+)" )
英文:

the easiest way is to replace the sequence of 2 characters || with any other character (e.g. ;) that will not be used in the data, and only then use it as a separator:

local params = &quot;value1=data1||value2=da|ta2||value3=test&amp;user01|&quot;

for pair in string.gmatch(params:gsub(&#39;||&#39;,&#39;;&#39;), &quot;([^;]+)&quot;) do 
 print(pair) 
end

if all characters are possible, then any non-printable characters can be used, according to their codes: string.char(&quot;10&quot;) == &quot;\10&quot; == &quot;\n&quot;
even with code 1: &quot;\1&quot;

string.gmatch( params:gsub(&#39;||&#39;,&#39;&#39;), &quot;([^]+)&quot; )

huangapple
  • 本文由 发表于 2023年1月6日 14:31:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75027692.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定