获取对`&mut u32`的一部分的`&mut u8`引用

huangapple go评论64阅读模式
英文:

Get a `&mut u8` reference to part of a `&mut u32`

问题

fn as_ne_bytes_mut(num: &mut u32) -> [&mut u8; 4] {
    unsafe {
        let ptr: *mut u8 = num as *mut u32 as *mut u8;
        [
            &mut *ptr.add(0),
            &mut *ptr.add(1),
            &mut *ptr.add(2),
            &mut *ptr.add(3),
        ]
    }
}

fn main() {
    let mut num: u32 = 0x12345678;

    println!("{:#x} - {:x?}", num, num.to_be_bytes());

    let parts = as_ne_bytes_mut(&mut num);
    *parts[1] = 0xfe;

    println!("{:#x} - {:x?}", num, num.to_be_bytes());
}

Here's the corrected code with the correct as casts instead of &mut references. It should work as intended, allowing you to manipulate the bytes of a u32 through references to u8.

英文:

To convert a u32 to its bytes, I know there is already:

However, is there any sound and cross-platform way to convert a &mut u32 into references of its bytes, as &mut u8?

Like this:

fn as_be_bytes_mut(num: &mut u32) -> [&mut u8; 4] {
    todo!()
}

fn main() {
    let mut num: u32 = 0x12345678;

    // Prints `0x12345678 - [12, 34, 56, 78]`
    println!("{:#x} - {:x?}", num, num.to_be_bytes());

    let parts = as_be_bytes_mut(&mut num);
    *parts[2] = 0xfe;

    // Should print `0x1234fe78 - [12, 34, fe, 78]`
    println!("{:#x} - {:x?}", num, num.to_be_bytes());
}

The rationale is that it should be theoretically possible, because there are no invalid states of a u32, no matter how you modify its underlying bytes.


Attempt:

fn as_ne_bytes_mut(num: &mut u32) -> [&mut u8; 4] {
    unsafe {
        let ptr: *mut u8 = (num as *mut u32).cast();
        [
            &mut *ptr.add(0),
            &mut *ptr.add(1),
            &mut *ptr.add(2),
            &mut *ptr.add(3),
        ]
    }
}

fn main() {
    let mut num: u32 = 0x12345678;

    println!("{:#x} - {:x?}", num, num.to_be_bytes());

    let parts = as_ne_bytes_mut(&mut num);
    *parts[1] = 0xfe;

    println!("{:#x} - {:x?}", num, num.to_be_bytes());
}
0x12345678 - [12, 34, 56, 78]
0x1234fe78 - [12, 34, fe, 78]

I think (tm) this is sound, because a u32 is a packed array of u8's and u8's are always correctly aligned, and the lifetimes should also match. I didn't find a way yet to implement as_be/le_bytes_mut yet. Also I'm not 100% sure this is sound, so some feedback would help.

答案1

得分: 2

以下是翻译好的部分:

它应该能够得到一个`&mut [u8; 4]`。
这比`[&mut u8; 4]`要好得多,因为它是一个无操作。但是,除非你交换位并返回一个`impl DerefMut<Target = &mut [u8; 4]>`的守卫,在它被丢弃时将位交换回来,否则你将无法创建`le`和`be`版本。
我可能会选择四个函数,每个函数返回一个字节,而不是返回`[&mut u8; 4]`的`le`和`be`函数。尽管如果你以相同的方式使用它们,它们应该会优化为相同的结果。
Miri认为这些都还可以。但是,你可以将它们变成一个单一的const泛型函数,但是要为它提供`0..4`的边界会不太方便。[playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=35158fde09016e23b8eaa064bfd209c7)

请注意,我已将HTML实体字符(例如&amp;&quot;)转换为它们的相应字符。如果需要更多翻译,请告诉我。

英文:

It should be sound to get a &amp;mut [u8; 4] out of it.

pub fn as_ne_bytes_mut(num: &amp;mut u32) -&gt; &amp;mut [u8; 4] {
    unsafe {
        let arr: *mut [u8; 4] = (num as *mut u32).cast();
        &amp;mut *arr
    }
}

This is much better than [&amp;mut u8; 4] since it's a no-op. However, you aren't going to be able to create le and be versions unless you swap the bits around and return an impl DerefMut&lt;Target = &amp;mut [u8; 4]&gt; guard that swaps the bits back when it's dropped.

I would probably go with four functions that return one byte each instead of le and be functions that return [&amp;mut u8; 4]. Although if you use them the same, they should optimize to the same thing.

pub fn most_sig_byte_0(num: &amp;mut u32) -&gt; &amp;mut u8 {
    if cfg!(target_endian = &quot;big&quot;) {
        &amp;mut as_ne_bytes_mut(num)[0]
    } else {
        &amp;mut as_ne_bytes_mut(num)[3]
    }
}

pub fn most_sig_byte_1(num: &amp;mut u32) -&gt; &amp;mut u8 {
    if cfg!(target_endian = &quot;big&quot;) {
        &amp;mut as_ne_bytes_mut(num)[1]
    } else {
        &amp;mut as_ne_bytes_mut(num)[2]
    }
}

pub fn most_sig_byte_2(num: &amp;mut u32) -&gt; &amp;mut u8 {
    if cfg!(target_endian = &quot;big&quot;) {
        &amp;mut as_ne_bytes_mut(num)[2]
    } else {
        &amp;mut as_ne_bytes_mut(num)[1]
    }
}

pub fn most_sig_byte_3(num: &amp;mut u32) -&gt; &amp;mut u8 {
    if cfg!(target_endian = &quot;big&quot;) {
        &amp;mut as_ne_bytes_mut(num)[3]
    } else {
        &amp;mut as_ne_bytes_mut(num)[0]
    }
}

Miri thinks these are fine, at least. You could make them a single const generic function, but it would be less ergonomic to give it the 0..4 bound. (playground)

答案2

得分: 1

可以通过`cfg(target_endian = &quot;...&quot;)`来实现:

pub fn as_ne_bytes_mut(num: &mut u32) -> [&mut u8; 4] {
    unsafe {
        let ptr: *mut u8 = (num as *mut u32).cast();
        [
            &mut *ptr.add(0),
            &mut *ptr.add(1),
            &mut *ptr.add(2),
            &mut *ptr.add(3),
        ]
    }
}

pub fn as_be_bytes_mut(num: &mut u32) -> [&mut u8; 4] {
    let mut b = as_ne_bytes_mut(num);

    #[cfg(target_endian = &quot;little&quot;)]
    b.reverse();

    b
}

pub fn as_le_bytes_mut(num: &mut u32) -> [&mut u8; 4] {
    let mut b = as_be_bytes_mut(num);
    b.reverse();
    b
}

fn main() {
    let mut num: u32 = 0x12345678;

    // 输出 `0x12345678 - [12, 34, 56, 78]`
    println!("{:#x} - {:x?}", num, num.to_be_bytes());

    let parts = as_be_bytes_mut(&mut num);
    *parts[2] = 0xfe;

    // 应该输出 `0x1234fe78 - [12, 34, fe, 78]`
    println!("{:#x} - {:x?}", num, num.to_be_bytes());
}
0x12345678 - [12, 34, 56, 78]
0x1234fe78 - [12, 34, fe, 78]

虽然看起来有点复杂,但编译器成功地进行了完美优化

example::as_ne_bytes_mut:
        mov     rax, rdi
        lea     rcx, [rsi + 1]
        lea     rdx, [rsi + 2]
        mov     qword ptr [rdi], rsi
        add     rsi, 3
        mov     qword ptr [rdi + 8], rcx
        mov     qword ptr [rdi + 16], rdx
        mov     qword ptr [rdi + 24], rsi
        ret

example::as_be_bytes_mut:
        mov     rax, rdi
        lea     rcx, [rsi + 1]
        lea     rdx, [rsi + 2]
        lea     rdi, [rsi + 3]
        mov     qword ptr [rax], rdi
        mov     qword ptr [rax + 24], rsi
        mov     qword ptr [rax + 8], rdx
        mov     qword ptr [rax + 16], rcx
        ret

example::as_le_bytes_mut:
        mov     rax, rdi
        lea     rcx, [rsi + 1]
        lea     rdx, [rsi + 2]
        mov     qword ptr [rdi], rsi
        add     rsi, 3
        mov     qword ptr [rdi + 24], rsi
        mov     qword ptr [rdi + 8], rcx
        mov     qword ptr [rdi + 16], rdx
        ret
英文:

This can be achieved with the help of cfg(target_endian = &quot;...&quot;):

pub fn as_ne_bytes_mut(num: &amp;mut u32) -&gt; [&amp;mut u8; 4] {
    unsafe {
        let ptr: *mut u8 = (num as *mut u32).cast();
        [
            &amp;mut *ptr.add(0),
            &amp;mut *ptr.add(1),
            &amp;mut *ptr.add(2),
            &amp;mut *ptr.add(3),
        ]
    }
}

pub fn as_be_bytes_mut(num: &amp;mut u32) -&gt; [&amp;mut u8; 4] {
    let mut b = as_ne_bytes_mut(num);

    #[cfg(target_endian = &quot;little&quot;)]
    b.reverse();

    b
}

pub fn as_le_bytes_mut(num: &amp;mut u32) -&gt; [&amp;mut u8; 4] {
    let mut b = as_be_bytes_mut(num);
    b.reverse();
    b
}

fn main() {
    let mut num: u32 = 0x12345678;

    // Prints `0x12345678 - [12, 34, 56, 78]`
    println!(&quot;{:#x} - {:x?}&quot;, num, num.to_be_bytes());

    let parts = as_be_bytes_mut(&amp;mut num);
    *parts[2] = 0xfe;

    // Should print `0x1234fe78 - [12, 34, fe, 78]`
    println!(&quot;{:#x} - {:x?}&quot;, num, num.to_be_bytes());
}
0x12345678 - [12, 34, 56, 78]
0x1234fe78 - [12, 34, fe, 78]

While it does look a little convoluted, the compiler manages to optimize it perfectly:

example::as_ne_bytes_mut:
        mov     rax, rdi
        lea     rcx, [rsi + 1]
        lea     rdx, [rsi + 2]
        mov     qword ptr [rdi], rsi
        add     rsi, 3
        mov     qword ptr [rdi + 8], rcx
        mov     qword ptr [rdi + 16], rdx
        mov     qword ptr [rdi + 24], rsi
        ret

example::as_be_bytes_mut:
        mov     rax, rdi
        lea     rcx, [rsi + 1]
        lea     rdx, [rsi + 2]
        lea     rdi, [rsi + 3]
        mov     qword ptr [rax], rdi
        mov     qword ptr [rax + 24], rsi
        mov     qword ptr [rax + 8], rdx
        mov     qword ptr [rax + 16], rcx
        ret

example::as_le_bytes_mut:
        mov     rax, rdi
        lea     rcx, [rsi + 1]
        lea     rdx, [rsi + 2]
        mov     qword ptr [rdi], rsi
        add     rsi, 3
        mov     qword ptr [rdi + 24], rsi
        mov     qword ptr [rdi + 8], rcx
        mov     qword ptr [rdi + 16], rdx
        ret

答案3

得分: 1

这段代码可以在不使用指针算术的情况下完成。

以下是代码的中文翻译:

pub fn as_le_bytes_mut(num: &mut u32) -> [&mut u8; 4] {
    let num_slice = std::slice::from_mut(num);
    let (pref, middle, suff): (_, &mut [u8], _) = unsafe {
        num_slice.align_to_mut()
    };
    // 当我们转换为u8时,这将始终为true
    assert!(pref.is_empty() && suff.is_empty());

    match middle {
        #[cfg(target_endian = "little")]
        [a, b, c, d] => [a, b, c, d],
        #[cfg(target_endian = "big")]
        [a, b, c, d] => [d, c, b, a],
        _ => unreachable!()
    }
}

pub fn as_be_bytes_mut(num: &mut u32) -> [&mut u8; 4] {
    let mut r = as_le_bytes_mut(num);
    r.reverse();
    r
}

它还可以编译成优化的汇编代码,您可以在[godbolt链接](https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:rust,selection:(endColumn:2,endLineNumber:22,positionColumn:1,positionLineNumber:1,selectionStartColumn:2,selectionStartLineNumber:22,startColumn:1,startLineNumber:1),source:'pub+fn+as_le_bytes_mut(num:+%26mut+u32)-%3E%5B%26mut+u8%3B+4%5D%7B%0A++++let+num_slice+%3D+std::slice::from_mut(num)%3B%0A++++let+(pref,+middle,+suff):+(_,%26mut+%5Bu8%5D,_)+%3D+unsafe%7B%0A++++++++num_slice.align_to_mut()%0A++++%7D%3B%0A++++//+This+would+be+always+true+when+we+cast+to+u8%0A++++assert!!(pref.is_empty()+%26%26+suff.is_empty())%3B%0A++++%0A++++match+middle+%7B%0A++++++++%23%5Bcfg(target_endian+%3D+%22little%22)%5D%0A++++++++%5Ba,+b,+c,+d%5D+%3D%3E+%5Ba,+b,+c,+d%5D,%0A++++++++%23%5Bcfg(target_endian+%3D+%22big%22)%5D%0A++++++++%5Ba,+b,+c,+d%5D+%3D%3E+%5Bd,+c,+b,+a%5D,%0A++++++++_+%3D%3E+unreachable!!()%0A++++%7D%0A%7D%0A%0Apub+fn+as_be_bytes_mut(num:+%26mut+u32)-%3E%5B%26mut+u8%3B+4%5D%7B%0A++++let+mut+r+%3D+as_le_bytes_mut(num)%3B%0A++++r.reverse()%3B%0A++++r%0A%7D'),l:'5',n:'0',o:'',t:'0')),k:50,l:'4',n:'0',o:'',s:0,t:'0')查看到。

英文:

It is possible to do this without pointer arithmetics.

Code:

pub fn as_le_bytes_mut(num: &amp;mut u32)-&gt;[&amp;mut u8; 4]{
    let num_slice = std::slice::from_mut(num);
    let (pref, middle, suff): (_,&amp;mut [u8],_) = unsafe{
        num_slice.align_to_mut()
    };
    // This would be always true when we cast to u8
    assert!(pref.is_empty() &amp;&amp; suff.is_empty());
    
    match middle {
        #[cfg(target_endian = &quot;little&quot;)]
        [a, b, c, d] =&gt; [a, b, c, d],
        #[cfg(target_endian = &quot;big&quot;)]
        [a, b, c, d] =&gt; [d, c, b, a],
        _ =&gt; unreachable!()
    }
}

pub fn as_be_bytes_mut(num: &amp;mut u32)-&gt;[&amp;mut u8; 4]{
    let mut r = as_le_bytes_mut(num);
    r.reverse();
    r
}

And it compiles to nice assembly too: godbolt link.

huangapple
  • 本文由 发表于 2023年5月26日 13:38:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76337924.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定