英文:
Regex split string by two consecutive pipe ||
问题
我想通过两个竖线(|| )正则表达式来拆分下面的字符串。
输入字符串
value1=data1||value2=da|ta2||value3=test&user01|
期望输出
value1=data1
value2=da|ta2
value3=test&user01|
我尝试了 ([^||]+)
,但它考虑了单个竖线 | 也要进行拆分。
尝试我的示例 - 正则表达式
value2 中有一个单竖线,不应被视为匹配。
我正在使用 Lua 脚本如下:
for pair in string.gmatch(params, "([^||]+)") do
print(pair)
end
英文:
I want to split below string by two pipe(|| ) regex .
Input String
value1=data1||value2=da|ta2||value3=test&user01|
Expected Output
value1=data1
value2=da|ta2
value3=test&user01|
I tried ([^||]+)
but its consider single pipe | also to split .
Try out my example - Regex
value2 has single pipe it should not be considered as matching.
I am using lua script like
for pair in string.gmatch(params, "([^||]+)") do
print(pair)
end
答案1
得分: 3
你可以明确地找到每个 ||
。
$ cat foo.lua
s = 'value1=data1||value2=da|ta2||value3=test&user01|'
offset = 1
for idx in string.gmatch(s, '()||') do
print(string.sub(s, offset, idx - 1))
offset = idx + 2
end
-- 处理右边最右边的 `||` 之后的部分。
-- 必须 +1,否则无法处理类似 "a=b||" 的 s。
if offset <= #s + 1 then
print(string.sub(s, offset))
end
$ lua foo.lua
value1=data1
value2=da|ta2
value3=test&user01|
关于 ()||
请参考 Lua 的 Patterns 文档 (Lua 不支持正则表达式):
捕获:
一个模式可以包含用括号括起来的子模式;它们描述了捕获。当匹配成功时,与捕获匹配的主题字符串的子字符串将被存储(捕获)以供将来使用。捕获按照它们左括号的顺序编号。例如,在模式
"(a*(.)%w(%s*))"
中,与字符串部分匹配"(a*(.)%w(%s*))"
的部分被存储为第一个捕获,因此编号为 1;与字符匹配"."
的捕获编号为 2,与匹配""%s*""
的部分的捕获编号为 3。作为特殊情况,捕获
()
捕获当前字符串位置(一个数字)。例如,如果我们将模式()"aa()"
应用于字符串"flaaap"
,将会有两个捕获:3 和 5。
英文:
You can explicitly find each ||
.
$ cat foo.lua
s = 'value1=data1||value2=da|ta2||value3=test&user01|'
offset = 1
for idx in string.gmatch(s, '()||') do
print(string.sub(s, offset, idx - 1) )
offset = idx + 2
end
-- Deal with the part after the right-most `||`.
-- Must +1 or it'll fail to handle s like "a=b||".
if offset <= #s + 1 then
print(string.sub(s, offset) )
end
$ lua foo.lua
value1=data1
value2=da|ta2
value3=test&user01|
Regarding ()||
see Lua's doc about Patterns <sup>(Lua does not have regex support)</sup> —
> * Captures:
>
> A pattern can contain sub-patterns enclosed in parentheses; they describe captures. When a match succeeds, the substrings of the subject string that match captures are stored (captured) for future use. Captures are numbered according to their left parentheses. For instance, in the pattern "(a*(.)%w(%s*))"
, the part of the string matching "a*(.)%w(%s*)"
is stored as the first capture, and therefore has number 1; the character matching "."
is captured with number 2, and the part matching "%s*"
has number 3.
>
> As a special case, the capture ()
captures the current string position (a number). For instance, if we apply the pattern "()aa()"
on the string "flaaap"
, there will be two captures: 3 and 5.
答案2
得分: 2
最简单的方法是将2个字符的序列||
替换为任何其他字符(例如;
),该字符不会在数据中使用,然后将其用作分隔符:
local params = "value1=data1||value2=da|ta2||value3=test&user01|"
for pair in string.gmatch(params:gsub('||',';'), "([^;]+)") do
print(pair)
end
如果所有字符都可能出现,那么可以根据它们的代码使用任何不可打印字符:string.char("10") == "\10" == "\n"
,甚至可以使用代码1:"\1"
string.gmatch( params:gsub('||',''), "([^]+)" )
英文:
the easiest way is to replace the sequence of 2 characters ||
with any other character (e.g. ;
) that will not be used in the data, and only then use it as a separator:
local params = "value1=data1||value2=da|ta2||value3=test&user01|"
for pair in string.gmatch(params:gsub('||',';'), "([^;]+)") do
print(pair)
end
if all characters are possible, then any non-printable characters can be used, according to their codes: string.char("10") == "\10" == "\n"
even with code 1: "\1"
string.gmatch( params:gsub('||',''), "([^]+)" )
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论