英文:
Cast a pointer to unsigned to signed in Rust, is it UB?
问题
基本上,我的问题是以下的Rust代码是否会导致未定义行为:
```rust
fn usigned_as_signed_ref<'a>(x: &'a u64) -> &'a i64 {
assert!(*x <= i64::MAX as u64);
unsafe { std::mem::transmute(x) }
}
我需要它,因为我有一个包含u64
值的结构体,但我希望它实现一个需要通过引用返回此值的接口。因此,如果我不能使用上述代码,我将不得不将值存储两次,一次作为i64
,一次作为u64
。换句话说,情况如下:
trait ForeignInterface {
fn get_value<'a>(&'a self) -> &'a i64;
}
struct MyStruct(u64);
impl ForeignInterface for MyStruct {
fn get_value<'a>(&'a self) -> &'a i64 { usigned_as_signed_ref(&self.0) }
}
我目前的想法
我认为在“标准系统”(如x86)上,我期望这是完全安全的。特别是,i64
应该被表示为二进制补码,并且相同的字节顺序应该用于 u64
和 i64
。实际上,我无法想象这在任何地方会有所不同。然而,我也有过这样的经历,如果某些规范不能保证在所有地方都能工作,它将在奇怪的情况下失败。我找不到任何保证这会起作用的证据。
我不太确定这个问题是更适合在StackOverflow还是CodeReview上,如果您认为不同,请随时迁移它。
编辑 这个问题 似乎表明这是可以的,但没有提供任何证据。
<details>
<summary>英文:</summary>
Basically, my question is whether the following Rust code can cause UB:
```rust
fn usigned_as_signed_ref<'a>(x: &'a u64) -> &'a i64 {
assert!(*x <= i64::MAX as u64);
unsafe { std::mem::transmute(x) }
}
I need it, as I have a struct containing a u64
-value, but I want it to implement an interface that requires returning this value by reference. So if I can't use above code, I would have to store the value twice, once as i64
and once as u64
. In other words, the situation is as follows:
trait ForeignInterface {
fn get_value<'a>(&'a self) -> &'a i64;
}
struct MyStruct(u64);
impl ForeignInterface for MyStruct {
fn get_value<'a>(&'a self) -> &'a i64 { usigned_as_signed_ref(&self.0) }
}
My thoughts so far
I think on a "standard system" (like x86) I expect this to be perfectly safe. In particular, i64
should be represented as two-complement, and the same endianness should be used for u64
and i64
. In fact, I cannot imagine that this is different anywhere. However, I have also made the experience that if something is not guaranteed to work everywhere by the specs, it will fail in weird circumstances. I could not find any guarantee for this to work.
I was not quite sure whether this question fits StackOverflow or CodeReview better, feel free to migrate it if you think otherwise.
Edit This question seems to indicate that it is indeed ok, but does not give any evidence.
答案1
得分: 2
字节顺序或字节表示错误不会使代码UB;它可能会使其不正确(不执行其想要的操作),但行为仍然完全定义。此外,字节顺序是机器的属性,而不是类型的属性,而且所有 Rust 整数都被定义为二进制补码。
唯一定义此转换的健全性的因素是:
- 大小。目标类型的大小必须小于或等于源类型的大小。保证
u64
和i64
的大小相同,所以这方面是没问题的。 - 对齐。目标类型的对齐必须小于或等于源类型的对齐。这方面定义不够明确,因为原始类型的对齐没有保证,但我认为可以假定有符号和无符号类型的对齐是相同的。如果你担心的话,你可以在转换之前添加一个断言:
assert!(std::mem::align_of::<i64>() <= std::mem::align_of::<u64>());
- 未初始化字节。每个未初始化的字节(填充字节或未初始化的
MaybeUninit
)必须与可能未初始化的字节(填充字节或MaybeUninit
)相对应。原始类型没有未初始化的字节(我不知道这是否在某个地方有规定,但人们依赖于这一点,所以这不会改变),所以这是微不足道的没问题。 - 库不变性。在转换类型时,必须确保不会创建根据定义类型的库所定义的不合法类型。整数没有这种不变性(除了语言不变性,比如没有未初始化内存),所以这方面也没问题。
英文:
Wrong endianness or byte representation will not make the code UB; it can make it incorrect (not doing what it wants to), but the behavior will still be perfectly defined. And also, endianness is a property of the machine, not the type, and all Rust integers are defined to be two's complement.
The only things that define this conversion's soundness is:
- Size. The size of the target type must be <= that of the source type. The size of
u64
andi64
is guaranteed to be the same, so we're fine with this. - Alignment. The alignment of the target type must be <= that of the source type. This is less well-defined as the alignment of primitive types is not guaranteed, but I'd say this is fine to assume the alignment of the signed and unsigned type are the same. If you are concerned, you can add an assert before the conversion:
assert!(std::mem::align_of::<i64>() <= std::mem::align_of::<u64>());
- Uninitialized bytes. Every uninitialized byte (padding byte or unitialized
MaybeUninit
) must be met with a possibly-uninitialized byte (padding byte orMaybeUninit
). Primitives do not have uninitialized bytes (I don't know if this is specified somewhere, but people rely on this so this won't change), so this is trivially fine. - Library invariants. When transmuting a type, you must make sure to not create invalid type according to the invariants that the library that defines the type defines. Integers do not have such invariants (besides the language invariants, such as no uninitialized memory), so we're fine with that too.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论