我的布尔 HashSet 从一个范围中有 3 个值是如何的?

huangapple go评论51阅读模式
英文:

How does my boolean HashSet from a span have 3 values?

问题

I vaguely understand this, but I would like a concrete explanation as to what's happening. If I construct a HashSet<bool> with data originally from a byte array, why exactly does it keep duplicate values? I've tried to debug this but once I have a bool array, all the elements appear like standard bools.

.Net Fiddle: https://dotnetfiddle.net/QOll01

byte[] bytes = new byte[] { 0, 1, 2 };
ReadOnlySpan<byte> span = new(bytes);
ReadOnlySpan<bool> boolSpan = MemoryMarshal.Cast<byte, bool>(span);
bool[] bools = boolSpan.ToArray();

Console.WriteLine(string.Join(", ", bools)); // False, True, True

Console.WriteLine(new HashSet<bool>(bools).Count); // 3??
Console.WriteLine(string.Join(", ", new HashSet<bool>(bools))); // False, True, True
英文:

I vaguely understand this, but I would like a concrete explanation as to what's happening. If I construct a HashSet&lt;bool&gt; with data originally from a byte array, why exactly does it keep duplicate values? I've tried to debug this but once I have a bool array, all the elements appear like standard bools.

.Net Fiddle: https://dotnetfiddle.net/QOll01

byte[] bytes = new byte[] { 0, 1, 2 };
ReadOnlySpan&lt;byte&gt; span = new(bytes);
ReadOnlySpan&lt;bool&gt; boolSpan = MemoryMarshal.Cast&lt;byte, bool&gt;(span);
bool[] bools = boolSpan.ToArray();

Console.WriteLine(string.Join(&quot;, &quot;, bools)); // False, True, True

Console.WriteLine(new HashSet&lt;bool&gt;(bools).Count); // 3??
Console.WriteLine(string.Join(&quot;, &quot;, new HashSet&lt;bool&gt;(bools))); // False, True, True

答案1

得分: 4

我的猜测是,bool 类型的内存实际上可以填充任何数据。如果你明确设置它,那么它可能会填充全部为0或全部为1,但实际上可以填充任何数据,任何非零值都被解释为 true。因为你是从数字创建你的值,所以这两个 true 值在内存中实际上包含不同的数字,因此,虽然在解释为 bool 值时它们相同,但实际上在底层作为数字进行比较,因此它们并不相等。

为了测试,我将这个:

byte[] bytes = new byte[] { 0, 1, 2 };

改成了这个:

byte[] bytes = new byte[] { 0, 1, 2, 1, 2 };

输出结果如下:
<pre>
False, True, True, True, True
3
False, True, True
</pre>
这似乎支持了这个理论。

英文:

My guess would be that the memory for a bool can actually be populated with any data. If you set it explicitly then it is probably populated with all 0s or all 1s but it can actually be populated with anything and any value that is not zero is interpreted as true. Because you are creating your values from numbers, the two true values actually do contain different numbers in memory so, while they are the same when interpreted as bool values, they are actually being compared as numbers under the hood and are thus not equal.

To test, I changed this:

byte[] bytes = new byte[] { 0, 1, 2 };

to this:

byte[] bytes = new byte[] { 0, 1, 2, 1, 2 };

and the output was this:
<pre>
False, True, True, True, True
3
False, True, True
</pre>
That appears to support the theory.

答案2

得分: 3

bool由8位表示,但C#编译器仅发出/期望01。你已经将2引入为布尔值。由于C#期望的是0或1,布尔值上的Equals(object)方法返回了一个不正确的值。

EqualityComparer&lt;bool&gt;.Default.Equals((unsafebool)1, (unsafebool)2); // False; visually: true != true

因此,你的HashSet中有3个值。

布尔值的表示

C#和VB编译器使用单字节值10分别表示trueTrue)和falseFalseboolBoolean)值,并假设它们处理的任何布尔值都受限于这两个基础值的表示。ECMA 335 CLI规范允许"true"布尔值由任何非零值表示。如果使用具有除01之外的底层表示的布尔值,可能会得到意外的结果。这可以在C#的unsafe代码中发生,或者通过与允许其他值的语言进行互操作而发生。为了避免这些意外结果,程序员有责任规范化这些传入的值。

另请参阅https://github.com/dotnet/roslyn/issues/24652

https://github.com/dotnet/roslyn/blob/main/docs/compilers/Boolean%20Representation.md

CLI布尔类型在内存中占用1字节。全零的位模式表示false的值。具有一个或多个设置位的位模式(类似于非零整数)表示true的值。对于栈操作的目的,布尔值被视为无符号1字节整数。

ECMA 335规范(§III.1.1.2)第293页

英文:

bools are represented by 8 bits, but only 0 and 1 are emitted/expected by C#'s compiler. You've introduced 2 as a boolean. Since C# is expecting either 0 or 1, the Equals(object) method on the boolean returns an incorrect value.

EqualityComparer&lt;bool&gt;.Default.Equals((unsafebool)1, (unsafebool)2); // False; visually: true != true

So you get 3 values in your HashSet.

> ## Representation of Boolean Values
> The C# and VB compilers represent true (True) and false (False) bool (Boolean) values with the single byte values 1 and 0, respectively, and assume that any boolean values that they are working with are restricted to being represented by these two underlying values. The ECMA 335 CLI specification permits a "true" boolean value to be represented by any nonzero value. If you use boolean values that have an underlying representation other than 0 or 1, you can get unexpected results. This can occur in unsafe code in C#, or by interoperating with a language that permits other values. To avoid these unexpected results, it is the programmer's responsibility to normalize such incoming values.
>
>
> See also https://github.com/dotnet/roslyn/issues/24652

https://github.com/dotnet/roslyn/blob/main/docs/compilers/Boolean%20Representation.md

> A CLI Boolean type occupies 1 byte in memory. A bit pattern of all zeroes denotes a value of
false. A bit pattern with any one or more bits set (analogous to a non-zero integer) denotes a
value of true. For the purpose of stack operations boolean values are treated as unsigned 1-byte integers

ECMA 335 specification (§III.1.1.2) p. 293

huangapple
  • 本文由 发表于 2023年2月24日 10:02:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75552002.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定