2023年7月20日 15:07:35go评论99阅读模式

英文:

Lua 5.3: strange behavior with table index

问题

我在处理表格时发现了这种奇怪的行为：

folder1 = {1, 2, 3}
table.insert(folder1, "test") --4
folder1[5] = "test2"
print("#folder1: " .. #folder1)  --结果是 5
folder2 = {1, 2, 3}
table.insert(folder2, "test") --4
folder2[6] = "test2"
print("#folder2: " .. #folder2)  --结果是 6
folder3 = {1, 2, 3}
table.insert(folder3, "test") --4
folder3[7] = "test2"
print("#folder3: " .. #folder3)  --结果是 4

第一个情况是正确的，因为它是一个没有间隙的数组。第三个情况可以理解，因为#table返回nil索引之前的最后一个索引。
但是第二种情况是怎么回事？我本来期望在这里得到索引4。

而且似乎只有在使用table.insert之前才会发生这种情况。如果我删除插入命令，输出又会如预期一样：

folder1 = {1, 2, 3}
folder1[4] = "test2"
print("#folder1: " .. #folder1)  --结果是 4
folder2 = {1, 2, 3}
folder2[5] = "test2"
print("#folder2: " .. #folder2)  --结果是 3
folder3 = {1, 2, 3}
folder3[6] = "test2"
print("#folder3: " .. #folder3)  --结果是 3

有人能解释一下吗？

我已经阅读了Lua文档。函数rawlen(table)看起来像是返回表格的真实大小的函数，但我发现rawlen也只返回非nil索引之前的最后一个索引。

英文:

so I was playing around with tables a bit and found this strange behavior:

folder1 = {1, 2, 3}
table.insert(folder1, &quot;test&quot;) --4
folder1[5] = &quot;test2&quot;
print(&quot;#folder1: &quot; .. #folder1)  --yields 5
folder2 = {1, 2, 3}
table.insert(folder2, &quot;test&quot;) --4
folder2[6] = &quot;test2&quot;
print(&quot;#folder2: &quot; .. #folder2)  --yields 6
folder3 = {1, 2, 3}
table.insert(folder3, &quot;test&quot;) --4
folder3[7] = &quot;test2&quot;
print(&quot;#folder3: &quot; .. #folder3)  --yields 4

the first case is correct since it's an array with no gap. The third case is understandable since #table returns the last index before a nil index.
But what happens in case two? I was expecting to receive the index 4 here.

Also it only seems to happen with a table.insert before. If I remove the insert command, the output is as expected again:

folder1 = {1, 2, 3}
folder1[4] = &quot;test2&quot;
print(&quot;#folder1: &quot; .. #folder1)  --yields 4
folder2 = {1, 2, 3}
folder2[5] = &quot;test2&quot;
print(&quot;#folder2: &quot; .. #folder2)  --yields 3
folder3 = {1, 2, 3}
folder3[6] = &quot;test2&quot;
print(&quot;#folder3: &quot; .. #folder3)  --yields 3

Can someone explain this to me?

I was reading into the Lua documentation. the function rawlen(table) was looking like the function that would return the true size of the table, but I figured out, that rawlen also just returns the last index before a non-nil index.

答案1

得分: 1

让我们来解释一下这里发生的事情。

第一个例子：在操作之后，folder1 = {1, 2, 3, "test", 5}。

这是一个合适的序列，从键 1 到 5 的所有条目都不为 nil，所以长度是 5，正如预期的那样。

第二个例子：在操作之后，folder2 = {1, 2, 3, "test", nil, "test2"}。这不是一个合适的“序列”，它有洞。你的解释是

第三种情况是可以理解的，因为 #table 返回 nil 索引之前的最后一个索引。

但是这个定义是错误的；#table 返回任何索引 i，使得 t[i] ~= nil 并且 t[i+1] == nil。它不保证是最后或 第一个 索引，实际上，出于性能原因，它通常不会是最后一个或第一个索引。

在第二个例子中，4 和 6 都是 #folder2 的有效值。你得到的是 6，所以一切都正常。

第三个例子：在操作之后，folder3 = {1, 2, 3, "test", nil, nil, "test2"}。#folder3 的有效值可以是 4 或 7。较大的间隙导致 PUC Lua 计算 #folder3 为 7，这与实现细节有关。

你的第二组示例可以类似地解释：

#{1, 2, 3, "test2"} 是 4，如预期的那样
#{1, 2, 3, nil, "test2"} 可能是 3 或 5；你得到的是 3，这是正常的
#{1, 2, 3, nil, nil, "test2"} 可能是 3 或 6；你得到的是 6，这也是正常的

Lua 表是使用数组和哈希部分实现的。 "数组" 部分的条目不会总是落在数组部分，但是 Lua 希望 {[3] = 3, [2] = 2, [1] = 1} 和 {1, 2, 3} 在程序员的角度看起来是一样的。因此，Lua 需要为哈希表定义 #t；这意味着它必须以某种方式找到长度，不能仅依赖于内部的长度字段。即使对于数组来说，这也行不通，因为程序员可能会在数组的中间设置条目为 nil，然后在数组的末尾设置条目。

如果 Lua 必须返回第一个或最后一个边界索引，它要么必须进行额外的繁琐的开销的记录，要么必须使用线性搜索。这两种选项都不可行。

Lua 所做的是相当聪明的：它使用 二分搜索 来找到任何边界：如果 t[i] 是 nil，那么你知道将会有一个边界 j < i，使得 t[j + 1] 为 nil。如果 t[i] 不是 nil，那么你知道将会有一个边界 j >= i，使得 t[j + 1] 是 nil - 你可以在每一步中将边界的搜索空间减半。为了找到 i 的初始上界，你可以检查二的幂 i = 2^n，直到找到一个 nil 为止。此外，你可以使用最大安全整数作为上界。

以下是我一段时间前在纯 Lua 中重新实现表长度运算符的代码，其中包括一些测试，以便给你一个想法：

local rawget = rawget
local maxint = math.maxinteger or 2^53
function len(tbl)
	if tbl[1] == nil then
		return 0
	end
	if tbl[maxint] then
		return maxint
	end
	-- Quickly ramp numbers up. Could start with high = maxint.
	local low, high = 1, 1
	while rawget(tbl, high) do
		low, high = high, high * 2
		if low > high then -- overflow
			high = maxint
			break
		end
	end
	while low < high do
		local mid = math.ceil((low + high) / 2)
		if tbl[mid] == nil then
			high = mid - 1
		else
			low = mid
		end
	end
	return low
end
assert(len{} == 0)
local t = {}; for i = 1, 1e6 do t[i] = i end; assert(len(t) == 1e6)
print(len{1, 2, 3, 4, 5, 6})
print(len{1, 2, 3, nil, 5, 6})
t = {}
for i = 0, 500 do
	t[2^i] = i
end
print(len(t))

Lua 可以以更少的时间获得更紧密的上界，因为它知道哈希和数组部分的容量，并可以将其用作上界；它不必执行“尝试二的幂”的技巧。

英文:

Let's unpack what's happening here.

First example: After the operations, folder1 = {1, 2, 3, "test", 5}.

This is a proper sequence, all entries from key 1 to 5 are non-nil, so the length is 5, just as expected.

Second example: After the operations, folder2 = {1, 2, 3, "test", nil, "test2"}. This is not a proper "sequence", it has holes. Your explanation is

> The third case is understandable since #table returns the last index before a nil index.

but this definition is wrong; #table returns any index i such that t[i] ~= nil and t[i+1] == nil. It is not guaranteed to be the last or the first index, and in practice it often won't be for performance reasons 1.

In the second example, both 4 and 6 are valid values for #folder2. You are getting 6, so everything is fine here.

Third example: After the operations, folder3 = {1, 2, 3, "test", nil, nil, "test2"}. Both 4 and 7 are valid values for #folder3. The larger gap leads PUC Lua to evaluate #folder3 to 7, which has to do with the implementation details. 1

Your second set of examples can be explained analogeously:

#{1, 2, 3, "test2"} is 4, as expected
#{1, 2, 3, nil, "test2"} may be either 3 or 5; you get 3, which is fine
#{1, 2, 3, nil, nil, "test2"} may be either 3 or 6; you get ``, which is fine

1 Lua tables are implemented using an array and a hash part. The "array" part entries won't always land in the array part, but Lua wants {[3] = 3, [2] = 2, [1] = 1} and {1, 2, 3} to be the same from the programmer's point of view. Thus Lua needs to define #t for hash tables; this means it has to find the length somehow, it can't just rely on an internal length field. Even for arrays this won't work, since the programmer may set entries to nil in the middle of the array, and later on, at the end of the array.

If Lua had to return the first or last boundary index, it would either have to do additional bookkeeping with significant overhead, or it would have to use a linear search. Both aren't viable options.

What Lua does instead is quite clever: It uses a binary search to find any boundary: If t[i] is nil, you know that there will be a boundary j < i such that t[j + 1] is a nil. If t[i] is not nil, you know that there will be a boundary j >= i such that t[j + 1] is nil - you can halve the search space for the boundary in each step. To find an initial upper bound on i, you can just check powers of two i = 2^n until you hit a nil. Furthermore, you can use the max safe integer as an upper bound.

Here is a reimplementation of the table length operator in pure Lua I did a while ago, with some tests included to give you an idea:

local rawget = rawget
local maxint = math.maxinteger or 2^53
function len(tbl)
	if tbl[1] == nil then
		return 0
	end
	if tbl[maxint] then
		return maxint
	end
	-- Quickly ramp numbers up. Could start with high = maxint.
	local low, high = 1, 1
	while rawget(tbl, high) do
		low, high = high, high * 2
		if low &gt; high then -- overflow
			high = maxint
			break
		end
	end
	while low &lt; high do
		local mid = math.ceil((low + high) / 2)
		if tbl[mid] == nil then
			high = mid - 1
		else
			low = mid
		end
	end
	return low
end
assert(len{} == 0)
local t = {}; for i = 1, 1e6 do t[i] = i end; assert(len(t) == 1e6)
print(len{1, 2, 3, 4, 5, 6})
print(len{1, 2, 3, nil, 5, 6})
t = {}
for i = 0, 500 do
	t[2^i] = i
end
print(len(t))

Lua can get tighter upper bounds using less time since it knows the capacity of the hash and array part and can use that as an upper bound; it won't have to do the "try powers of two" trick.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Lua 5.3：表索引的奇怪行为

问题

答案1

我的Lua 5.1实现中OPCODE TFORLOOP有问题，不正常工作。

Telescope nvim未显示结果。

如何在Roblox中翻转重力和玩家？

如何实现C++代码以使其类似于Lua表格？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。