如何在Rust中遍历*const char指针的字符元素?

huangapple go评论71阅读模式
英文:

How to traverse character elements of *const char pointer in Rust?

问题

我是新手 Rust 编程,当这种语言与 C 不同的时候,我遇到了一些困难。例如,我有一个 C 函数如下:

bool check(char* data, int size){
    int i;
    for(i = 0; i < size; i++){
        if( data[i] != 0x00){
            return false;
        }
    }
    return true;
}

我该如何将这个函数转换成 Rust?我尝试过类似 C 的方式,但出现了错误 :((

英文:

I'm new to Rust programing and I have a bit of difficulty when this language is different from C Example, I have a C function as follows:

bool check(char* data, int size){
    int i;
    for(i = 0; i &lt; size; i++){
        if( data[i] != 0x00){
            return false;
        }
    }
    return true;
}

How can I convert this function to Rust? I tried it like C, but it has Errors :((

答案1

得分: 2

首先,我假设您希望尽可能少地使用 unsafe 代码。否则,实际上没有任何理由使用Rust,因为您会失去它带来的所有优势。

根据data表示的内容,有多种方法可以将其传递给Rust。

首先:在Rust中,使用指针和长度作为两个单独的参数是不可能的,除非使用unsafe。不过,它有相同的概念;它被称为slices。切片(slice)与指针和大小的组合完全相同,只是编译器理解它并在编译时检查其正确性。

话虽如此,在C中的char*实际上可能是以下四种情况之一。每种情况在Rust中映射到不同的类型:

  • 二进制数据,其释放由其他地方处理(在Rust中称为借用数据
    • 对应于 &[u8],一个切片。切片的实际内容是:
      • 数据的地址作为 *u8(对用户隐藏)
      • 数据的长度作为 usize
  • 必须在此函数中使用后进行释放的二进制数据(在Rust中称为拥有数据
    • 对应于 Vec<u8>;一旦超出范围,数据就会被删除
    • 实际内容是:
      • 数据的地址作为 *u8(对用户隐藏)
      • 数据的长度作为 usize
      • 分配的大小作为 usize。这允许高效的 push()/pop() 操作。保证数据的长度不超过分配的大小。
  • 一个其释放由其他地方处理的字符串(在Rust中称为借用字符串
    • 对应于 &str,一个所谓的字符串切片
    • 这与 &[u8] 完全相同,额外的是编译时保证它包含有效的UTF-8数据。
  • 一个必须在此函数中使用后进行释放的字符串(在Rust中称为拥有字符串
    • 对应于 String
    • Vec<u8> 完全相同,额外的是编译时保证它包含有效的UTF-8数据。

您可以从 Vec<u8> 创建 &[u8] 引用,并且可以从 String 创建 &str 引用。


现在是我必须做一个假设的时候。因为您发布的函数检查data的所有元素是否都为零,并且如果找到非零元素则返回false,我假设data的内容是二进制数据。并且因为您的函数没有包含 free 调用,我假设它是借用数据

有了这个知识,给定的函数在Rust中的翻译如下:

fn check(data: &[u8]) -> bool {
    for d in data {
        if *d != 0x00 {
            return false;
        }
    }
    true
}

fn main() {
    let x = vec![0, 0, 0];
    println!("Check {:?}: {}", x, check(&x));

    let y = vec![0, 1, 0];
    println!("Check {:?}: {}", y, check(&y));
}
Check [0, 0, 0]: true
Check [0, 1, 0]: false

这是一个相当直接的翻译;在Rust中,大量使用 for 循环并不太符合习惯。良好的Rust代码大多是基于迭代器的;迭代器大部分时间是零成本抽象,可以被编译得非常高效。

如果基于迭代器重写代码,它会像这样:

fn check(data: &[u8]) -> bool {
    data.iter().all(|el| *el == 0x00)
}

fn main() {
    let x = vec![0, 0, 0];
    println!("Check {:?}: {}", x, check(&x));

    let y = vec![0, 1, 0];
    println!("Check {:?}: {}", y, check(&y));
}
Check [0, 0, 0]: true
Check [0, 1, 0]: false

这更符合Rust的惯用法,更易读,清晰地表达了"如果所有元素都等于零,则返回true"。基于 for 的代码需要花些时间来理解它是“所有元素都为零”、 "任何元素都为零"、 "所有元素都不为零" 还是 "任何元素都不为零"。

请注意,两个版本的代码编译为完全相同的字节码

还要注意,与C版本不同,Rust的借用检查器在编译时保证 data 是有效的。在Rust中(没有 unsafe 的情况下),不可能产生双重释放、使用后释放、数组越界访问或任何其他可能引起内存损坏的未定义行为。

这也是Rust不使用 unsafe 做指针的原因 - 它需要数据的长度来在运行时检查越界错误。这意味着,在Rust中通过 [] 操作符访问数据稍微昂贵一些(因为它每次都会执行越界检查),这就是基于迭代器的编程的原因。迭代器可以比通过 [] 操作符直接访问数据更高效地迭代数据。

英文:

First off, I assume that you want to use as little unsafe code as possible. Otherwise there really isn't any reason to use Rust in the first place, as you forfeit all the advantages it brings you.

Depending on what data represents, there are multiple ways to transfer this to Rust.

First off: Using pointer and length as two separate arguments is not possible in Rust without unsafe. It has the same concept, though; it's called slices. A slice is exactly the same as a pointer-size combination, just that the compiler understands it and checks it for correctness at compile time.

That said, a char* in C could actually be one of four things. Each of those things map to different types in Rust:

  • Binary data whose deallocation is taken care of somewhere else (in Rust terms: borrowed data)
    • maps to &amp;[u8], a slice. The actual content of the slice is:
      • the address of the data as *u8 (hidden from the user)
      • the length of the data as usize
  • Binary data that has to be deallocated within this function after using it (in Rust terms: owned data)
    • maps to Vec&lt;u8&gt;; as soon as it goes out of scope the data is deleted
    • actual content is:
      • the address of the data as *u8 (hidden from the user)
      • the length of the data as usize
      • the size of the allocation as usize. This allows for efficient push()/pop() operations. It is guaranteed that the length of the data does not exceed the size of the allocation.
  • A string whose deallocation is taken care of somewhere else (in Rust terms: a borrowed string)
    • maps to &amp;str, a so called string slice.
    • This is identical to &amp;[u8] with the additional compile time guarantee that it contains valid UTF-8 data.
  • A string that has to be deallocated within this function after using it (in Rust terms: an owned string)
    • maps to String
    • same as Vec&lt;u8&gt; with the additional compile time guarantee that it contains valid UTF-8 data.

You can create &amp;[u8] references from Vec&lt;u8&gt;'s and &amp;str references from Strings.


Now this is the point where I have to make an assumption. Because the function that you posted checks if all of the elements of data are zero, and returns false if if finds a non-zero element, I assume the content of data is binary data. And because your function does not contain a free call, I assume it is borrowed data.

With that knowledge, this is how the given function would translate to Rust:

fn check(data: &amp;[u8]) -&gt; bool {
    for d in data {
        if *d != 0x00 {
            return false;
        }
    }
    true
}

fn main() {
    let x = vec![0, 0, 0];
    println!(&quot;Check {:?}: {}&quot;, x, check(&amp;x));

    let y = vec![0, 1, 0];
    println!(&quot;Check {:?}: {}&quot;, y, check(&amp;y));
}
Check [0, 0, 0]: true
Check [0, 1, 0]: false

This is quite a direct translation; it's not really idiomatic to use for loops a lot in Rust. Good Rust code is mostly iterator based; iterators are most of the time zero-cost abstraction that can get compiled very efficiently.

This is how your code would look like if rewritten based on iterators:

fn check(data: &amp;[u8]) -&gt; bool {
    data.iter().all(|el| *el == 0x00)
}

fn main() {
    let x = vec![0, 0, 0];
    println!(&quot;Check {:?}: {}&quot;, x, check(&amp;x));

    let y = vec![0, 1, 0];
    println!(&quot;Check {:?}: {}&quot;, y, check(&amp;y));
}
Check [0, 0, 0]: true
Check [0, 1, 0]: false

The reason this is more idiomatic is that it's a lot easier to read for someone who hasn't written it. It clearly says "return true if all elements are equal to zero". The for based code needs a second to think about to understand if its "all elements are zero", "any element is zero", "all elements are non-zero" or "any element is non-zero".

Note that both versions compile to the exact same bytecode.

Also note that, unlike the C version, the Rust borrow checker guarantees at compile time that data is valid. It's impossible in Rust (without unsafe) to produce a double free, a use-after-free, an out-of-bounds array access or any other kind of undefined behaviour that would cause memory corruption.

This is also the reason why Rust doesn't do pointers without unsafe - it needs the length of the data to check out-of-bounds errors at runtime. That means, accessing data via [] operator is a little more costly in Rust (as it does perform an out-of-bounds check every time), which is the reason why iterator based programming is a thing. Iterators can iterate over data a lot more efficient than directly accessing it via [] operators.

huangapple
  • 本文由 发表于 2023年2月19日 23:56:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/75501403.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定