英文:
How does my boolean HashSet from a span have 3 values?
问题
I vaguely understand this, but I would like a concrete explanation as to what's happening. If I construct a HashSet<bool>
with data originally from a byte
array, why exactly does it keep duplicate values? I've tried to debug this but once I have a bool
array, all the elements appear like standard bool
s.
.Net Fiddle: https://dotnetfiddle.net/QOll01
byte[] bytes = new byte[] { 0, 1, 2 };
ReadOnlySpan<byte> span = new(bytes);
ReadOnlySpan<bool> boolSpan = MemoryMarshal.Cast<byte, bool>(span);
bool[] bools = boolSpan.ToArray();
Console.WriteLine(string.Join(", ", bools)); // False, True, True
Console.WriteLine(new HashSet<bool>(bools).Count); // 3??
Console.WriteLine(string.Join(", ", new HashSet<bool>(bools))); // False, True, True
英文:
I vaguely understand this, but I would like a concrete explanation as to what's happening. If I construct a HashSet<bool>
with data originally from a byte
array, why exactly does it keep duplicate values? I've tried to debug this but once I have a bool
array, all the elements appear like standard bool
s.
.Net Fiddle: https://dotnetfiddle.net/QOll01
byte[] bytes = new byte[] { 0, 1, 2 };
ReadOnlySpan<byte> span = new(bytes);
ReadOnlySpan<bool> boolSpan = MemoryMarshal.Cast<byte, bool>(span);
bool[] bools = boolSpan.ToArray();
Console.WriteLine(string.Join(", ", bools)); // False, True, True
Console.WriteLine(new HashSet<bool>(bools).Count); // 3??
Console.WriteLine(string.Join(", ", new HashSet<bool>(bools))); // False, True, True
答案1
得分: 4
我的猜测是,bool
类型的内存实际上可以填充任何数据。如果你明确设置它,那么它可能会填充全部为0或全部为1,但实际上可以填充任何数据,任何非零值都被解释为 true
。因为你是从数字创建你的值,所以这两个 true
值在内存中实际上包含不同的数字,因此,虽然在解释为 bool
值时它们相同,但实际上在底层作为数字进行比较,因此它们并不相等。
为了测试,我将这个:
byte[] bytes = new byte[] { 0, 1, 2 };
改成了这个:
byte[] bytes = new byte[] { 0, 1, 2, 1, 2 };
输出结果如下:
<pre>
False, True, True, True, True
3
False, True, True
</pre>
这似乎支持了这个理论。
英文:
My guess would be that the memory for a bool
can actually be populated with any data. If you set it explicitly then it is probably populated with all 0s or all 1s but it can actually be populated with anything and any value that is not zero is interpreted as true
. Because you are creating your values from numbers, the two true
values actually do contain different numbers in memory so, while they are the same when interpreted as bool
values, they are actually being compared as numbers under the hood and are thus not equal.
To test, I changed this:
byte[] bytes = new byte[] { 0, 1, 2 };
to this:
byte[] bytes = new byte[] { 0, 1, 2, 1, 2 };
and the output was this:
<pre>
False, True, True, True, True
3
False, True, True
</pre>
That appears to support the theory.
答案2
得分: 3
bool
由8位表示,但C#编译器仅发出/期望0
和1
。你已经将2
引入为布尔值。由于C#期望的是0或1,布尔值上的Equals(object)
方法返回了一个不正确的值。
EqualityComparer<bool>.Default.Equals((unsafebool)1, (unsafebool)2); // False; visually: true != true
因此,你的HashSet
中有3个值。
布尔值的表示
C#和VB编译器使用单字节值
1
和0
分别表示true
(True
)和false
(False
)bool
(Boolean
)值,并假设它们处理的任何布尔值都受限于这两个基础值的表示。ECMA 335 CLI规范允许"true"布尔值由任何非零值表示。如果使用具有除0
或1
之外的底层表示的布尔值,可能会得到意外的结果。这可以在C#的unsafe
代码中发生,或者通过与允许其他值的语言进行互操作而发生。为了避免这些意外结果,程序员有责任规范化这些传入的值。
另请参阅https://github.com/dotnet/roslyn/issues/24652
https://github.com/dotnet/roslyn/blob/main/docs/compilers/Boolean%20Representation.md
CLI布尔类型在内存中占用1字节。全零的位模式表示
false
的值。具有一个或多个设置位的位模式(类似于非零整数)表示true
的值。对于栈操作的目的,布尔值被视为无符号1字节整数。
英文:
bool
s are represented by 8 bits, but only 0
and 1
are emitted/expected by C#'s compiler. You've introduced 2
as a boolean. Since C# is expecting either 0 or 1, the Equals(object)
method on the boolean returns an incorrect value.
EqualityComparer<bool>.Default.Equals((unsafebool)1, (unsafebool)2); // False; visually: true != true
So you get 3 values in your HashSet
.
> ## Representation of Boolean Values
> The C# and VB compilers represent true
(True
) and false
(False
) bool
(Boolean
) values with the single byte values 1
and 0
, respectively, and assume that any boolean values that they are working with are restricted to being represented by these two underlying values. The ECMA 335 CLI specification permits a "true" boolean value to be represented by any nonzero value. If you use boolean values that have an underlying representation other than 0
or 1
, you can get unexpected results. This can occur in unsafe
code in C#, or by interoperating with a language that permits other values. To avoid these unexpected results, it is the programmer's responsibility to normalize such incoming values.
>
>
> See also https://github.com/dotnet/roslyn/issues/24652
https://github.com/dotnet/roslyn/blob/main/docs/compilers/Boolean%20Representation.md
> A CLI Boolean type occupies 1 byte in memory. A bit pattern of all zeroes denotes a value of
false. A bit pattern with any one or more bits set (analogous to a non-zero integer) denotes a
value of true. For the purpose of stack operations boolean values are treated as unsigned 1-byte integers
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论