英文:
Lua 5.3: strange behavior with table index
问题
我在处理表格时发现了这种奇怪的行为:
folder1 = {1, 2, 3}
table.insert(folder1, "test") --4
folder1[5] = "test2"
print("#folder1: " .. #folder1) --结果是 5
folder2 = {1, 2, 3}
table.insert(folder2, "test") --4
folder2[6] = "test2"
print("#folder2: " .. #folder2) --结果是 6
folder3 = {1, 2, 3}
table.insert(folder3, "test") --4
folder3[7] = "test2"
print("#folder3: " .. #folder3) --结果是 4
第一个情况是正确的,因为它是一个没有间隙的数组。第三个情况可以理解,因为#table返回nil索引之前的最后一个索引。
但是第二种情况是怎么回事?我本来期望在这里得到索引4。
而且似乎只有在使用table.insert
之前才会发生这种情况。如果我删除插入命令,输出又会如预期一样:
folder1 = {1, 2, 3}
folder1[4] = "test2"
print("#folder1: " .. #folder1) --结果是 4
folder2 = {1, 2, 3}
folder2[5] = "test2"
print("#folder2: " .. #folder2) --结果是 3
folder3 = {1, 2, 3}
folder3[6] = "test2"
print("#folder3: " .. #folder3) --结果是 3
有人能解释一下吗?
我已经阅读了Lua文档。函数rawlen(table)
看起来像是返回表格的真实大小的函数,但我发现rawlen
也只返回非nil索引之前的最后一个索引。
英文:
so I was playing around with tables a bit and found this strange behavior:
folder1 = {1, 2, 3}
table.insert(folder1, "test") --4
folder1[5] = "test2"
print("#folder1: " .. #folder1) --yields 5
folder2 = {1, 2, 3}
table.insert(folder2, "test") --4
folder2[6] = "test2"
print("#folder2: " .. #folder2) --yields 6
folder3 = {1, 2, 3}
table.insert(folder3, "test") --4
folder3[7] = "test2"
print("#folder3: " .. #folder3) --yields 4
the first case is correct since it's an array with no gap. The third case is understandable since #table returns the last index before a nil index.
But what happens in case two? I was expecting to receive the index 4 here.
Also it only seems to happen with a table.insert
before. If I remove the insert command, the output is as expected again:
folder1 = {1, 2, 3}
folder1[4] = "test2"
print("#folder1: " .. #folder1) --yields 4
folder2 = {1, 2, 3}
folder2[5] = "test2"
print("#folder2: " .. #folder2) --yields 3
folder3 = {1, 2, 3}
folder3[6] = "test2"
print("#folder3: " .. #folder3) --yields 3
Can someone explain this to me?
I was reading into the Lua documentation. the function rawlen(table)
was looking like the function that would return the true size of the table, but I figured out, that rawlen
also just returns the last index before a non-nil index.
答案1
得分: 1
让我们来解释一下这里发生的事情。
第一个例子:在操作之后,folder1 = {1, 2, 3, "test", 5}
。
这是一个合适的序列,从键 1
到 5
的所有条目都不为 nil,所以长度是 5
,正如预期的那样。
第二个例子:在操作之后,folder2 = {1, 2, 3, "test", nil, "test2"}
。这不是一个合适的“序列”,它有洞。你的解释是
第三种情况是可以理解的,因为 #table 返回 nil 索引之前的最后一个索引。
但是这个定义是错误的;#table
返回 任何 索引 i
,使得 t[i] ~= nil
并且 t[i+1] == nil
。它 不 保证是 最后 或 第一个 索引,实际上,出于性能原因,它通常不会是最后一个或第一个索引。
在第二个例子中,4
和 6
都是 #folder2
的有效值。你得到的是 6
,所以一切都正常。
第三个例子:在操作之后,folder3 = {1, 2, 3, "test", nil, nil, "test2"}
。#folder3
的有效值可以是 4
或 7
。较大的间隙导致 PUC Lua 计算 #folder3
为 7
,这与实现细节有关。
你的第二组示例可以类似地解释:
#{1, 2, 3, "test2"}
是4
,如预期的那样#{1, 2, 3, nil, "test2"}
可能是3
或5
;你得到的是3
,这是正常的#{1, 2, 3, nil, nil, "test2"}
可能是3
或6
;你得到的是6
,这也是正常的
Lua 表是使用数组和哈希部分实现的。 "数组" 部分的条目不会总是落在数组部分,但是 Lua 希望 {[3] = 3, [2] = 2, [1] = 1}
和 {1, 2, 3}
在程序员的角度看起来是一样的。因此,Lua 需要为哈希表定义 #t
;这意味着它必须以某种方式找到长度,不能仅依赖于内部的长度字段。即使对于数组来说,这也行不通,因为程序员可能会在数组的中间设置条目为 nil
,然后在数组的末尾设置条目。
如果 Lua 必须返回第一个或最后一个边界索引,它要么必须进行额外的繁琐的开销的记录,要么必须使用线性搜索。这两种选项都不可行。
Lua 所做的是相当聪明的:它使用 二分搜索 来找到 任何 边界:如果 t[i]
是 nil
,那么你知道将会有一个边界 j < i
,使得 t[j + 1]
为 nil
。如果 t[i]
不是 nil
,那么你知道将会有一个边界 j >= i
,使得 t[j + 1]
是 nil
- 你可以在每一步中将边界的搜索空间减半。为了找到 i
的初始上界,你可以检查二的幂 i = 2^n
,直到找到一个 nil
为止。此外,你可以使用最大安全整数作为上界。
以下是我一段时间前在纯 Lua 中重新实现表长度运算符的代码,其中包括一些测试,以便给你一个想法:
local rawget = rawget
local maxint = math.maxinteger or 2^53
function len(tbl)
if tbl[1] == nil then
return 0
end
if tbl[maxint] then
return maxint
end
-- Quickly ramp numbers up. Could start with high = maxint.
local low, high = 1, 1
while rawget(tbl, high) do
low, high = high, high * 2
if low > high then -- overflow
high = maxint
break
end
end
while low < high do
local mid = math.ceil((low + high) / 2)
if tbl[mid] == nil then
high = mid - 1
else
low = mid
end
end
return low
end
assert(len{} == 0)
local t = {}; for i = 1, 1e6 do t[i] = i end; assert(len(t) == 1e6)
print(len{1, 2, 3, 4, 5, 6})
print(len{1, 2, 3, nil, 5, 6})
t = {}
for i = 0, 500 do
t[2^i] = i
end
print(len(t))
Lua 可以以更少的时间获得更紧密的上界,因为它知道哈希和数组部分的容量,并可以将其用作上界;它不必执行“尝试二的幂”的技巧。
英文:
Let's unpack what's happening here.
First example: After the operations, folder1 = {1, 2, 3, "test", 5}
.
This is a proper sequence, all entries from key 1
to 5
are non-nil, so the length is 5
, just as expected.
Second example: After the operations, folder2 = {1, 2, 3, "test", nil, "test2"}
. This is not a proper "sequence", it has holes. Your explanation is
> The third case is understandable since #table returns the last index before a nil index.
but this definition is wrong; #table
returns any index i
such that t[i] ~= nil
and t[i+1] == nil
. It is not guaranteed to be the last or the first index, and in practice it often won't be for performance reasons <sup>1</sup>.
In the second example, both 4
and 6
are valid values for #folder2
. You are getting 6
, so everything is fine here.
Third example: After the operations, folder3 = {1, 2, 3, "test", nil, nil, "test2"}
. Both 4
and 7
are valid values for #folder3
. The larger gap leads PUC Lua to evaluate #folder3
to 7
, which has to do with the implementation details. <sup>1</sup>
Your second set of examples can be explained analogeously:
#{1, 2, 3, "test2"}
is4
, as expected#{1, 2, 3, nil, "test2"}
may be either3
or5
; you get3
, which is fine#{1, 2, 3, nil, nil, "test2"}
may be either3
or6
; you get ``, which is fine
<sup>1</sup> Lua tables are implemented using an array and a hash part. The "array" part entries won't always land in the array part, but Lua wants {[3] = 3, [2] = 2, [1] = 1}
and {1, 2, 3}
to be the same from the programmer's point of view. Thus Lua needs to define #t
for hash tables; this means it has to find the length somehow, it can't just rely on an internal length field. Even for arrays this won't work, since the programmer may set entries to nil
in the middle of the array, and later on, at the end of the array.
If Lua had to return the first or last boundary index, it would either have to do additional bookkeeping with significant overhead, or it would have to use a linear search. Both aren't viable options.
What Lua does instead is quite clever: It uses a binary search to find any boundary: If t[i]
is nil
, you know that there will be a boundary j < i
such that t[j + 1]
is a nil
. If t[i]
is not nil
, you know that there will be a boundary j >= i
such that t[j + 1]
is nil
- you can halve the search space for the boundary in each step. To find an initial upper bound on i
, you can just check powers of two i = 2^n
until you hit a nil
. Furthermore, you can use the max safe integer as an upper bound.
Here is a reimplementation of the table length operator in pure Lua I did a while ago, with some tests included to give you an idea:
local rawget = rawget
local maxint = math.maxinteger or 2^53
function len(tbl)
if tbl[1] == nil then
return 0
end
if tbl[maxint] then
return maxint
end
-- Quickly ramp numbers up. Could start with high = maxint.
local low, high = 1, 1
while rawget(tbl, high) do
low, high = high, high * 2
if low > high then -- overflow
high = maxint
break
end
end
while low < high do
local mid = math.ceil((low + high) / 2)
if tbl[mid] == nil then
high = mid - 1
else
low = mid
end
end
return low
end
assert(len{} == 0)
local t = {}; for i = 1, 1e6 do t[i] = i end; assert(len(t) == 1e6)
print(len{1, 2, 3, 4, 5, 6})
print(len{1, 2, 3, nil, 5, 6})
t = {}
for i = 0, 500 do
t[2^i] = i
end
print(len(t))
Lua can get tighter upper bounds using less time since it knows the capacity of the hash and array part and can use that as an upper bound; it won't have to do the "try powers of two" trick.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论