英文:
Lua split string considering nil entry
问题
以下是翻译好的部分:
str = "cat,dog,,horse"
for word in string.gmatch(str, "([^,%s]+)") do
print(word)
end
这段代码的输出如下:
cat
dog
horse
我也想要考虑空值(nil),并且希望得到以下输出:
cat
dog
nil
horse
如何实现这个目标?请有经验的人指点一下。
英文:
str = "cat,dog,,horse"
for word in string.gmatch(str, "([^,'',%s]+)") do
print(word)
end
This code outputs the following.
cat
dog
horse
I want to consider nil entry as well and want to have the following output.
cat
dog
nil
horse
How can this be done? Could someone please point out?
答案1
得分: 3
以下是翻译好的内容:
nil ~= ""
。这里可能需要空字符串而不是nil。不过,将一个转换为另一个都很简单,所以下面的代码中我将使用空字符串。- 在
gmatch
模式周围不需要括号。如果没有“捕获”(括号),整个模式会被隐式捕获。 - 对于您的模式,我感到有些困惑。您正在匹配一个或多个非(空格、逗号或单引号)字符的序列;也就是说,您正在使用所有空格、逗号和单引号进行分割。出于某种原因,您在字符类中还有
'
和,
两次;只有一次就足够了。我将假设您想按,
分割。
问题在于,目前您的模式在您想要使用*
(零或多个)量词时使用了+
(一个或多个)。在Lua 5.4 上仅使用*
完全正常:
Lua 5.4.4 Copyright (C) 1994-2022 Lua.org, PUC-Rio
> local str = "cat,dog,,horse"; for word in str:gmatch"[^,]*" do print(word) end
cat
dog
horse
然而,在尝试在LuaJIT上运行相同代码时存在问题:它将生成看似随机的空字符串,而不仅仅生成两个连续分隔符的空字符串(这可以被视为“技术上正确”,因为空字符串是*
的匹配项,但我认为这违反了*
的贪婪性)。一个解决方案是要求每个匹配项以分隔符结尾,附加一个分隔符,然后匹配除分隔符之外的所有内容:
LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
JIT: ON SSE2 SSE3 SSE4.1 AMD BMI2 fold cse dce fwd dse narrow loop abc sink fuse
> local str = "cat,dog,,horse"; for word in (str .. ","):gmatch("(.-),") do print(word) end
cat
dog
horse
第三个选项是手动使用重复调用string.find
进行拆分。以下是我自己编写的用于此的实用程序:
function spliterator(str, delim, plain)
assert(delim ~= "")
local last_delim_end = 0
-- 在两个分隔符匹配之间的可能为空的子字符串的迭代器
-- 要排除空字符串,可以过滤迭代器或使用`:gmatch"[...]+"`
return function()
if not last_delim_end then
return
end
local delim_start, delim_end = str:find(delim, last_delim_end + 1, plain)
local substr
if delim_start then
substr = str:sub(last_delim_end + 1, delim_start - 1)
else
substr = str:sub(last_delim_end + 1)
end
last_delim_end = delim_end
return substr
end
end
在此示例中的用法将是:
for word in spliterator("cat,dog,,horse", ",") do print(word) end
您是要将此添加到string
表中,将其保留在本地变量中,还是将其放在require
的字符串实用程序模块中,取决于您。
英文:
A few things:
nil ~= ""
. You probably want the empty string rather than nil here. It is however trivial to convert one into the other, so I'll be using the empty string in the following code.- You don't need the parentheses around the
gmatch
pattern. If there are no "captures" (parentheses), the entire pattern is implicitly captured. - I'm rather confused about the intent of your pattern. You're matching sequences of one or more non-(whitespace, comma, or single quote) characters; that is, you're splitting on all of whitespace, commata, and single quotes. For some reason, you also have
'
and,
twice in the character class; just once suffices. I'll be assuming you want to split by,
.
The issue is that currently your pattern uses the +
(one or more) quantifier when you want *
(zero or more). Just using *
works completely fine on Lua 5.4:
Lua 5.4.4 Copyright (C) 1994-2022 Lua.org, PUC-Rio
> local str = "cat,dog,,horse"; for word in str:gmatch"[^,]*" do print(word) end
cat
dog
horse
However, there is an issue when you try to run that same code on LuaJIT: It will produce seemingly random empty strings rather than only producing an empty string for two consecutive delimiters (this could be seen as "technically correct" since the empty string is a match for *
, but I see it as a violation of the greediness of *
). One solution is to require each match to end with a delimiter, appending a delimiter, and matching everything but the delimiter:
LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
JIT: ON SSE2 SSE3 SSE4.1 AMD BMI2 fold cse dce fwd dse narrow loop abc sink fuse
> local str = "cat,dog,,horse"; for word in (str .. ","):gmatch("(.-),") do print(word) end
cat
dog
horse
A third option would be to split manually using repeated calls to string.find
. Here's the utility I wrote myself for that:
function spliterator(str, delim, plain)
assert(delim ~= "")
local last_delim_end = 0
-- Iterator of possibly empty substrings between two matches of the delimiter
-- To exclude empty strings, filter the iterator or use `:gmatch"[...]+"` instead
return function()
if not last_delim_end then
return
end
local delim_start, delim_end = str:find(delim, last_delim_end + 1, plain)
local substr
if delim_start then
substr = str:sub(last_delim_end + 1, delim_start - 1)
else
substr = str:sub(last_delim_end + 1)
end
last_delim_end = delim_end
return substr
end
end
The usage in this example would be
for word in spliterator("cat,dog,,horse", ",") do print(word) end
Whether you want to add this to the string
table, keep it in a local variable or perhaps a require
d string util module is up to you.
答案2
得分: 2
我会这样做
str = "猫,狗,,马"
for word in string.gmatch(str..',', "([^,]*),") do
print(word == '' and 'nil' or word)
end
英文:
I would do this
str = "cat,dog,,horse"
for word in string.gmatch(str..',', "([^,]*),") do
print(word == '' and 'nil' or word)
end
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论