为什么这个 HashSet 在打印时看起来是有序的?

huangapple go评论79阅读模式
英文:

Why does this HashSet look sorted when printed?

问题

Set<Integer> s = new HashSet<Integer>();
s.add(77);
s.add(0);
s.add(1);

System.out.println(s);

TreeSet<Integer> s1 = new TreeSet<Integer>();
s1.add(77);
s1.add(0);
s1.add(1);

System.out.println(s1);

输出:

s = [0, 1, 77]
s1 = [0, 1, 77]

根据教程点页面上的定义

Set 是一个不包含重复元素的通用值集合。TreeSet 是一个元素被排序的集合。

为什么 ss1 的输出都是有序的?我原本期望只有 s1 的输出是有序的。

英文:
Set&lt;Integer&gt; s = new HashSet&lt;Integer&gt;();
s.add(77);
s.add(0);
s.add(1);
 
System.out.println(s);
 
TreeSet&lt;Integer&gt; s1 = new TreeSet&lt;Integer&gt;();
s1.add(77);
s1.add(0);
s1.add(1);
 
System.out.println(s1);

Output:

s = [0, 1, 77]
s1= [0, 1, 77]

By definition on the tutorials point page
> A Set is a generic set of values with no duplicate elements. A TreeSet is a set where the elements are sorted.

Why is the output for both s and s1 sorted? I was expecting only s1's output to be sorted.

答案1

得分: 6

你说得对,s1 是排序的,因为它是一个 TreeSet,但是 s 实际上并没有被完全排序。如果你向 s 添加更多元素,你会看到一些奇怪的情况发生。

在添加了 32、12、13 和 14 之后
[0, 32, 1, 12, 77, 13, 14]

所以现在你可以看到它有些被排序了,但并不完全是有序的。造成这种情况的原因是 HashSet 默认情况下的初始容量是 16,它会根据需要在后续扩展。因此,当你向其中添加一个元素时,可以将其哈希码取模 16,以便将其放入内部的 16 个存储桶之一。由于 Integer 的哈希码就是它所表示的整数值,我们可以假设它所做的仅是执行 (要添加的元素) % 16

因此,当你添加 0、1 和 77 时,内部情况可能是这样的:

存储桶   元素
0        0
1        1
2
3
4
5
...
13       77(77 % 16 是 13,所以放在这里)
14
15
16

然后我们添加了 32。32 % 16 是 0,但是第一个存储桶中已经有了 0。幸运的是,为了防止这种情况的碰撞,HashSet 使用了链表而不是单个元素,所以我们只需将 32 追加到包含 0 的链表中。

存储桶   元素
0        0 -> 32
1        1
2
3
4
5
...
13       77
14
15
16

对于 12、13 和 14,它们的处理方式也是相同的:

存储桶   元素
0        0 -> 32
1        1
2
3
4
5
...
12       12
13       77 -> 13
14       14
15
16

如果按顺序遍历存储桶,打印每个存储桶中的链表,你会得到 [0, 32, 1, 12, 77, 13, 14]

可以在这里查看它的运行结果

英文:

You are right that s1 is sorted because it's a TreeSet, but s isn't really sorted. If you add a few more elements to s, you can see something weird happening.

After adding 32, 12, 13, and 14
[0, 32, 1, 12, 77, 13, 14]

So now you see that it's somewhat ordered, but not really. The reason for that is that HashSet has a capacity of 16 by default, and it will grow later as the need arises. So when you add an element to it, picture it taking the hashcode of that element, modulo 16 so that it fits inside the internal "list" of 16 buckets. Since the hashcode of an Integer is the int it represents, we can pretend all it does is do (the element to be added) % 16.

So when you added, 0, 1, and 77, it probably looked something like this internally:

Bucket   Elements
0        0
1        1
2
3
4
5
...
13       77 (77 % 16 is 13, so it&#39;s placed here)
14
15
16

Then we add 32. 32 % 16 is 0, but we already have 0 in the first bucket. Luckily, to prevent collisions such as this one, HashSets use linked lists instead of single elements, so we just append 32 to our linked list containing 0.

Bucket   Elements
0        0 -&gt; 32
1        1  
2
3
4
5
...
13       77
14
15
16

It works the same way for 12, 13 and 14 too:

Bucket   Elements
0        0 -&gt; 32
1        1
2
3
4
5
...
12       12
13       77 -&gt; 13
14       14
15
16

And if you go through the buckets in order, printing each linked list in that bucket, you get [0, 32, 1, 12, 77, 13, 14]

See it run

答案2

得分: 3

这只是偶然发生的。

HashSetsHashMap 的一种特殊实现,但它们仍然使用 hashCode 将对象放入桶中。

对于 Integer 来说,标准的 hashCode 就是 int 值本身。

将这样低的值与负载因子和桶算法结合在一起,会导致它们根据该代码被放置在不同的桶中,但恰好这些桶是连续的。如果你将值更改为较大的值,它们将不会被排序,因为算法只使用 hashCode 的一部分来选择桶,因此它们连续的机会会减少。对于分布随机的更大数量的数字,这也是适用的。

Set<Integer> s = new HashSet<Integer>();

s.add(57999999);
s.add(67999999);
s.add(77999999);

System.out.println(s);

TreeSet<Integer> s1 = new TreeSet<Integer>();

s1.add(57999999);
s1.add(67999999);
s1.add(77999999);

System.out.println(s1);

在我的 Windows 系统上运行 Java 14,它们的输出如下:

[67999999, 77999999, 57999999]
[57999999, 67999999, 77999999]
英文:

This happens only by chance.

HashSets are a special implementation of HashMap but they still use the hashCode to place object in buckets.

The standard hashCode for an Integer is the int value itself.

Specifying such low values coupled with the load factor and bucket algorithm causes them to be placed in different buckets based on that code but the buckets happen to be sequential. If you change values to something larger they will not be ordered because the algorithm only uses part of the hashCode to select the bucket, so the chance of them being sequential is reduced. This would also be the case for much larger sets of randomly distributed numbers.

Set&lt;Integer&gt; s = new HashSet&lt;Integer&gt;();

s.add(57999999);
s.add(67999999);
s.add(77999999);

System.out.println(s);

TreeSet&lt;Integer&gt; s1 = new TreeSet&lt;Integer&gt;();

s1.add(57999999);
s1.add(67999999);
s1.add(77999999);


System.out.println(s1);

On my machine runing Windows and Java 14, they print as follows:

[67999999, 77999999, 57999999]
[57999999, 67999999, 77999999]


</details>



# 答案3
**得分**: 1

`TreeSet` 实现了 `SortedSet` 接口,这就是为什么你添加到其中的所有元素在实际添加之前都会被排序。

在你的情况下,你正在使用 `Integer` 进行操作。它实现了 `Comparable` 接口,具有默认的比较机制。而你的 `TreeSet` 则使用这个 Integer 的 `Comparable` 实现来在将其添加到集合之前进行比较。

因此,通常在使用 `TreeSet` 时有两种可能的解决方案:

1. 你的 `TreeSet` 应该只与实现了 `Comparable` 接口的对象一起使用,因为它们需要具有 `compareTo` 实现。`TreeSet` 将使用这个实现来执行排序。
2. 如果你的类没有实现 `Comparable`,在创建 `TreeSet` 实例时可以传入你的 `Comparator` 实现。

`HashSet` 在内部不执行任何排序操作,它只是随机的。元素根据它们的哈希码分配到不同的桶中,在你的情况下,它们的哈希码匹配。

更新:`HashSet` 并不是随机的,我的回答可能会让人感到困惑。如果感兴趣的话,可以查看[这个关于一些 `HashSet` 原理的解释](https://stackoverflow.com/a/64307640/9272947)。

<details>
<summary>英文:</summary>

`TreeSet` implements `SortedSet`, that&#39;s why all elements that you&#39;ve added to it will be sorted before actually adding.

In your case you&#39;re working with `Integer`. It&#39;s implements `Comparable`, it has default compare mechanism. And you `TreeSet` using this Integer&#39;s `Comparable` implementation to compare your Integers before adding them to collection.

So, in general when working with `TreeSet` you have two possible solutions:

 1. Your `TreeSet` should work only with objects that implements `Comparable` interface, because they need to have `compareTo` implementation. This implementation will be used by `TreeSet` to perform sorting.
 2. If your class isn&#39;t implementing `Comparable` you can pass to constructor your `Comparator` implementation when creating an instance of a `TreeSet`.

`HashSet` do not perform any sorting inside, it&#39;s just a random here. Elements goes to different buckets based on their hashcodes and in your case they matched up so.

UPD: `HashSet` it&#39;s not a random, my answer may be confusing, take a look at [this explanation of some `HashSet` principles](https://stackoverflow.com/a/64307640/9272947) if interested.

</details>



# 答案4
**得分**: 1

> 为什么 Set 接口返回一个有序列表?

在 Java 中,[Set 接口](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Set.html),或者在一般情况下 [集合(Set)](https://en.wikipedia.org/wiki/Set_(mathematics)) 的概念,可以被视为*抽象数据类型*,作为行为规范的*契约*,它声明了一个“不包含重复元素的集合”。

因此,`Set<E>` 接口本身并*不会*返回任何内容;然而,它可以通过[不同的方式进行实现](https://docs.oracle.com/javase/tutorial/collections/implementations/set.html)。

`TreeSet<E>` 的定义[说明](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/TreeSet.html):
&gt; *元素是有序的*,使用它们的[自然排序](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Comparable.html),或者通过在创建集合时提供的[比较器](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Comparator.html),具体取决于使用哪个构造函数。

另一方面,`HashSet<E>` 的定义[说明](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/HashSet.html):
&gt; 它不保证顺序会随时间保持恒定。

<details>
<summary>英文:</summary>

&gt; Why is the Set interface returning an ordered list?

[Set Interface](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Set.html) in Java, or a [Set](https://en.wikipedia.org/wiki/Set_(mathematics)) concept, in general, can be thought as an *Abstract Data Type*, as a *contract* of the behaviour specification, which declares *a collection that contains no duplicate elements*.

So, `Set&lt;E&gt;` interface, itself, does *not* return anything; however, it can be implemented [in different ways](https://docs.oracle.com/javase/tutorial/collections/implementations/set.html). 

`TreeSet&lt;E&gt;` definition [says, that](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/TreeSet.html):
&gt; The *elements are ordered* using their [natural ordering](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Comparable.html), or by a [Comparator](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Comparator.html) provided at set creation time, depending on which constructor is used.

`HashSet&lt;E&gt;` definition, on the other hand, [says, that](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/HashSet.html):
&gt; it does not guarantee that the order will remain constant over time.

</details>



huangapple
  • 本文由 发表于 2020年10月12日 02:13:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/64307466.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定