Julia字典(和集合)在运行之间稳定吗?

huangapple go评论62阅读模式
英文:

Are Julia dictionaries (and sets) stable between runs?

问题

我知道,在Julia中,字典不会保留键插入的原始顺序,如这里所描述,但是使用给定插入顺序构建的字典(和集合)是否保留它们在运行之间使用的任意顺序?这很重要,因为这意味着可以假定任意顺序在运行之间是稳定的,例如,如果我运行以下代码,顺序会被保留

dictionary = Dict(1 => 77, 2 => 66, 3 => 1, 50 => 2)
print(collect(keys(dictionary)))

因为在每次运行时它总是打印出

[50, 2, 3, 1]

这只是为了说明我的意思,一般来说,对于所有键类型(甚至对于整数)是否都是这样的?在这个问题中,有一个由@PM 2Ring提供的答案,其中描述了对于Python集合,这对于所有类型来说并不总是成立,因为

默认情况下,str、bytes和datetime对象的__hash__()值会与不可预测的随机值“混淆”起来

英文:

I know that, in Julia, dictionaries do not preserve the original ordering in which keys has been inserted as described here, but do dictionaries (and sets) constructed with a given insertion order preserve the arbitrary ordering they use between runs? This is relevant since it would mean that the arbitrary order can be assumed stable between runs, for example, if I run the following code the order is preserved

dictionary = Dict(1 => 77, 2 => 66, 3 => 1, 50 => 2)
print(collect(keys(dictionary)))

since at each run it always prints

[50, 2, 3, 1]

This is just to show what I mean, in general is this true for all key types (or even for integers)? in this question there is an answer by @PM 2Ring which describes that for Python sets this is not true for all types since

> By default, the hash() values of str, bytes and datetime objects
> are “salted” with an unpredictable random value

答案1

得分: 5

简短答案是:你不能保证这一点,也不应该依赖它成立,尤其是在不同机器、版本等情况下。

特别是集合(Sets)非常紧密地遵循集合的数学定义,并且没有顺序。

如果你需要依赖键或元素的顺序,你可以使用OrderedCollections包,它曾经是DataStructures的一部分。它提供了有序版本的集合和字典。

英文:

The short answer is: you can't guarantee it, and shouldn't rely on it to be the case, especially across machines, version, etc.

Sets in particular follow very closely the mathematical definition of a set, and have no ordering.

If you need to rely on the order of keys or elements, you can use the OrderedCollections package, which used to be part of DataStructures. It has ordered versions of sets and dictionaries.

答案2

得分: 2

官方答案就如Timothée所提供的(参见其他回答)。尽管如此,要探究Julia的内部(因为它主要是用Julia编写的)并找到这类问题的答案是很容易的。例如,通过以下方式:

@edit Dict(1 => 77, 2 => 66, 3 => 1, 50 => 2)

通过检查(以及OP中的示例),Dict 在元素排序中根本不使用随机化(这是合理的,否则它将不得不提供对随机生成器的访问以允许一致的运行复制)。顺序由键上的 hash 函数确定。这个哈希在不同的实现中可能会发生变化... 除非你自己定义。以下是一个示例:

julia> struct MyInt
           v::Int
       end
julia> Base.hash(a::MyInt) = hash(a.v)

julia> Dict(MyInt(1) => 77, MyInt(2) => 66, MyInt(3) => 1, MyInt(50) => 2)
Dict{MyInt, Int64} with 4 entries:
  MyInt(50) => 2
  MyInt(2)  => 66
  MyInt(3)  => 1
  MyInt(1)  => 77

julia> dictionary = Dict(MyInt(1) => 77, MyInt(2) => 66, MyInt(3) => 1, MyInt(50) => 2)
Dict{MyInt, Int64} with 4 entries:
  MyInt(50) => 2
  MyInt(2)  => 66
  MyInt(3)  => 1
  MyInt(1)  => 77

julia> print(collect(keys(dictionary)))
MyInt[MyInt(50), MyInt(2), MyInt(3), MyInt(1)]

现在,重新定义 hash 函数会改变顺序。请注意,这个哈希函数使用了原始的 Int 哈希,但为了稳定使用,一个人会定义一个明确的(希望是好的)哈希(有很多关于如何这样做的资料):

julia> Base.hash(a::MyInt) = hash(a.v) + 0x89

julia> dictionary = Dict(MyInt(1) => 77, MyInt(2) => 66, MyInt(3) => 1, MyInt(50) => 2)
Dict{MyInt, Int64} with 4 entries:
  MyInt(3)  => 1
  MyInt(1)  => 77
  MyInt(50) => 2
  MyInt(2)  => 66

julia> print(collect(keys(dictionary)))
MyInt[MyInt(3), MyInt(1), MyInt(50), MyInt(2)]

请注意,最后一个 Dict 以不同的顺序打印出来。

英文:

The official answer is as given by Timothée (see other answers). Having said that, it is easy to explore the insides of Julia (as it is mostly written in Julia) and find answers to such questions. For example by:

@edit Dict(1 => 77, 2 => 66, 3 => 1, 50 => 2)

By inspection (and example in OP) Dict does not use randomization in ordering elements at all (reasonable, as otherwise it would have to provide access to the random generator to allow consistent run replication). The order is determined by the hash function on the keys. This hash can change between implementations... unless you define your own. An example follows:

julia> struct MyInt
           v::Int
       end
julia> Base.hash(a::MyInt) = hash(a.v)

julia> Dict(MyInt(1) => 77, MyInt(2) => 66, MyInt(3) => 1, MyInt(50) => 2)
Dict{MyInt, Int64} with 4 entries:
  MyInt(50) => 2
  MyInt(2)  => 66
  MyInt(3)  => 1
  MyInt(1)  => 77

julia> dictionary = Dict(MyInt(1) => 77, MyInt(2) => 66, MyInt(3) => 1, MyInt(50) => 2)
Dict{MyInt, Int64} with 4 entries:
  MyInt(50) => 2
  MyInt(2)  => 66
  MyInt(3)  => 1
  MyInt(1)  => 77

julia> print(collect(keys(dictionary)))
MyInt[MyInt(50), MyInt(2), MyInt(3), MyInt(1)]

Now, redefining the hash function changes order. Note, that this hash function uses the original Int hash, but for stable use, one would define an explicit (hopefully good) hash (many sources on how to do so):

julia> Base.hash(a::MyInt) = hash(a.v)+0x89

julia> dictionary = Dict(MyInt(1) => 77, MyInt(2) => 66, MyInt(3) => 1, MyInt(50) => 2)
Dict{MyInt, Int64} with 4 entries:
  MyInt(3)  => 1
  MyInt(1)  => 77
  MyInt(50) => 2
  MyInt(2)  => 66

julia> print(collect(keys(dictionary)))
MyInt[MyInt(3), MyInt(1), MyInt(50), MyInt(2)]

Note the last Dict is printed in a different order.

huangapple
  • 本文由 发表于 2023年3月4日 06:20:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/75632333.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定