Lua分割字符串考虑到空值条目

huangapple go评论76阅读模式
英文:

Lua split string considering nil entry

问题

以下是翻译好的部分:

str = "cat,dog,,horse"
for word in string.gmatch(str, "([^,%s]+)") do
    print(word)
end

这段代码的输出如下:

cat
dog
horse

我也想要考虑空值(nil),并且希望得到以下输出:

cat
dog
nil
horse

如何实现这个目标?请有经验的人指点一下。

英文:
str = "cat,dog,,horse"
for word in string.gmatch(str, "([^,'',%s]+)") do
    print(word)
end

This code outputs the following.

cat
dog
horse

I want to consider nil entry as well and want to have the following output.

cat
dog
nil
horse

How can this be done? Could someone please point out?

答案1

得分: 3

以下是翻译好的内容:

  • nil ~= ""。这里可能需要空字符串而不是nil。不过,将一个转换为另一个都很简单,所以下面的代码中我将使用空字符串。
  • gmatch模式周围不需要括号。如果没有“捕获”(括号),整个模式会被隐式捕获。
  • 对于您的模式,我感到有些困惑。您正在匹配一个或多个非(空格、逗号或单引号)字符的序列;也就是说,您正在使用所有空格、逗号和单引号进行分割。出于某种原因,您在字符类中还有',两次;只有一次就足够了。我将假设您想按,分割。

问题在于,目前您的模式在您想要使用*(零或多个)量词时使用了+(一个或多个)。在Lua 5.4 上仅使用*完全正常:

Lua 5.4.4  Copyright (C) 1994-2022 Lua.org, PUC-Rio
> local str = "cat,dog,,horse"; for word in str:gmatch"[^,]*" do print(word) end
cat
dog

horse

然而,在尝试在LuaJIT上运行相同代码时存在问题:它将生成看似随机的空字符串,而不仅仅生成两个连续分隔符的空字符串(这可以被视为“技术上正确”,因为空字符串是*的匹配项,但我认为这违反了*的贪婪性)。一个解决方案是要求每个匹配项以分隔符结尾,附加一个分隔符,然后匹配除分隔符之外的所有内容:

LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
JIT: ON SSE2 SSE3 SSE4.1 AMD BMI2 fold cse dce fwd dse narrow loop abc sink fuse
> local str = "cat,dog,,horse"; for word in (str .. ","):gmatch("(.-),") do print(word) end
cat
dog

horse

第三个选项是手动使用重复调用string.find进行拆分。以下是我自己编写的用于此的实用程序:

function spliterator(str, delim, plain)
	assert(delim ~= "")
	local last_delim_end = 0

	-- 在两个分隔符匹配之间的可能为空的子字符串的迭代器
	-- 要排除空字符串,可以过滤迭代器或使用`:gmatch"[...]+"`
	return function()
		if not last_delim_end then
			return
		end

		local delim_start, delim_end = str:find(delim, last_delim_end + 1, plain)
		local substr
		if delim_start then
			substr = str:sub(last_delim_end + 1, delim_start - 1)
		else
			substr = str:sub(last_delim_end + 1)
		end
		last_delim_end = delim_end
		return substr
	end
end

在此示例中的用法将是:

for word in spliterator("cat,dog,,horse", ",") do print(word) end

您是要将此添加到string表中,将其保留在本地变量中,还是将其放在require的字符串实用程序模块中,取决于您。

英文:

A few things:

  • nil ~= "". You probably want the empty string rather than nil here. It is however trivial to convert one into the other, so I'll be using the empty string in the following code.
  • You don't need the parentheses around the gmatch pattern. If there are no "captures" (parentheses), the entire pattern is implicitly captured.
  • I'm rather confused about the intent of your pattern. You're matching sequences of one or more non-(whitespace, comma, or single quote) characters; that is, you're splitting on all of whitespace, commata, and single quotes. For some reason, you also have ' and , twice in the character class; just once suffices. I'll be assuming you want to split by ,.

The issue is that currently your pattern uses the + (one or more) quantifier when you want * (zero or more). Just using * works completely fine on Lua 5.4:

Lua 5.4.4  Copyright (C) 1994-2022 Lua.org, PUC-Rio
> local str = "cat,dog,,horse"; for word in str:gmatch"[^,]*" do print(word) end
cat
dog

horse

However, there is an issue when you try to run that same code on LuaJIT: It will produce seemingly random empty strings rather than only producing an empty string for two consecutive delimiters (this could be seen as "technically correct" since the empty string is a match for *, but I see it as a violation of the greediness of *). One solution is to require each match to end with a delimiter, appending a delimiter, and matching everything but the delimiter:

LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
JIT: ON SSE2 SSE3 SSE4.1 AMD BMI2 fold cse dce fwd dse narrow loop abc sink fuse
> local str = "cat,dog,,horse"; for word in (str .. ","):gmatch("(.-),") do print(word) end
cat
dog

horse

A third option would be to split manually using repeated calls to string.find. Here's the utility I wrote myself for that:

function spliterator(str, delim, plain)
	assert(delim ~= "")
	local last_delim_end = 0

	-- Iterator of possibly empty substrings between two matches of the delimiter
	-- To exclude empty strings, filter the iterator or use `:gmatch"[...]+"` instead
	return function()
		if not last_delim_end then
			return
		end

		local delim_start, delim_end = str:find(delim, last_delim_end + 1, plain)
		local substr
		if delim_start then
			substr = str:sub(last_delim_end + 1, delim_start - 1)
		else
			substr = str:sub(last_delim_end + 1)
		end
		last_delim_end = delim_end
		return substr
	end
end

The usage in this example would be

for word in spliterator("cat,dog,,horse", ",") do print(word) end

Whether you want to add this to the string table, keep it in a local variable or perhaps a required string util module is up to you.

答案2

得分: 2

我会这样做

  str = "猫,狗,,马"
  for word in string.gmatch(str..',', "([^,]*),") do
    print(word == '' and 'nil' or word)
  end
英文:

I would do this

  str = "cat,dog,,horse"
  for word in string.gmatch(str..',', "([^,]*),") do
    print(word == '' and 'nil' or word)
  end

huangapple
  • 本文由 发表于 2023年7月11日 02:40:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76656468.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定