英文:
Handling Out of Range Hex/Unicode
问题
#[no_mangle]
pub extern "C" fn some_function(name: *const c_char, text: *const c_char) {
unsafe {
let name = CStr::from_ptr(name).to_str().unwrap();
let text = CStr::from_ptr(text).to_str().unwrap();
// the rest
}
}
英文:
I'm working with a Rust cdylib
crate that I'm referencing and using in C++.
#[no_mangle]
pub extern "C" fn some_function(name: *const c_char, text: *const c_char) {
unsafe {
let name = CStr::from_ptr(name).to_str().unwrap();
let text = CStr::from_ptr(text).to_str().unwrap();
// the rest
}
}
When this function receives the character ±, it panics when attempting to get the text from the pointer. I'm passing this character in as a c_str()
in C++ from a std::string
:
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }', src\lib.rs:102:50
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Is there any way that I can properly handle this character in Rust? I don't need to manipulate it in any way, realistically this library is simple acting as a middle man, and just needs to pass it along.
When I use this to view the bytes I'm receiving:
let raw = CStr::from_ptr(text);
println!("Bytes: {:?}", raw.to_bytes_with_nul());
I get:
Bytes: [177, 0]
答案1
得分: 1
这是我复现您的问题的方法:
use std::ffi::CStr;
fn main() {
let raw_data: &[u8] = &[177, 0];
let raw = unsafe { CStr::from_ptr(raw_data.as_ptr().cast()) };
println!("字节: {:?}", raw.to_bytes_with_nul());
let string = raw.to_str().unwrap();
println!("{}", string);
}
字节: [177, 0]
线程 'main' 在 'src\main.rs' 的 9 行处恐慌: 在 `Err` 值上调用了 `Result::unwrap()`: Utf8Error { valid_up_to: 0, error_len: Some(1) }
这里的问题是to_str()
期望一个有效的UTF-8
字符串。[177]
不是有效的 UTF-8
。有效的 UTF-8
版本应该是:
println!("{:?}", "±".as_bytes());
[194, 177]
你的字符串似乎是以不同的方式编码的,例如 Windows-1252
。我将简单地假设是这样,因为在不了解更多关于你的代码的信息时,无法确定。但这是很有可能的,因为这是西方世界中 Windows 的默认编码。
在不同编码之间转换的最简单方法是使用 crate encoding_rs
。Rust 本身只内置了对 UTF-8
的支持,因此您需要使用外部的 crates,而这是最成熟的之一。
use std::ffi::{c_char, CStr};
use encoding_rs::WINDOWS_1252;
fn main() {
let raw_data: *const c_char = (&[177u8, 0u8]).as_ptr().cast();
let raw = unsafe { CStr::from_ptr(raw_data) };
println!("字节: {:?}", raw.to_bytes_with_nul());
let (string, actual_encoding, errors) = WINDOWS_1252.decode(raw.to_bytes());
println!("字符串: {:?}", string);
println!("实际编码: {:?}", actual_encoding);
println!("错误: {}", errors);
}
字节: [177, 0]
字符串: ""±""
实际编码: Encoding { windows-1252 }
错误: false
英文:
Here is how I reproduced your problem:
use std::ffi::CStr;
fn main() {
let raw_data: &[u8] = &[177, 0];
let raw = unsafe { CStr::from_ptr(raw_data.as_ptr().cast()) };
println!("Bytes: {:?}", raw.to_bytes_with_nul());
let string = raw.to_str().unwrap();
println!("{}", string);
}
Bytes: [177, 0]
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Utf8Error { valid_up_to: 0, error_len: Some(1) }', src\main.rs:9:31
The problem here is that to_str()
expects a valid UTF-8
string. [177]
is not valid UTF-8
. The valid UTF-8
version would be:
println!("{:?}", "±".as_bytes());
[194, 177]
Yours seems to be encoded differently, for example Windows-1252
. I will simply assume so, because without more knowledge about your code, there is no way of telling for sure. But it is very likely, as this is the default encoding for Windows in the western world.
The easiest way to convert between encodings is via the crate encoding_rs
. Rust itself only has UTF-8
support built in, so you need to use external crates for it, and this is the most established one.
use std::ffi::{c_char, CStr};
use encoding_rs::WINDOWS_1252;
fn main() {
let raw_data: *const c_char = (&[177u8, 0u8]).as_ptr().cast();
let raw = unsafe { CStr::from_ptr(raw_data) };
println!("Bytes: {:?}", raw.to_bytes_with_nul());
let (string, actual_encoding, errors) = WINDOWS_1252.decode(raw.to_bytes());
println!("String: {:?}", string);
println!("Actual encoding: {:?}", actual_encoding);
println!("Errors: {}", errors);
}
Bytes: [177, 0]
String: "±"
Actual encoding: Encoding { windows-1252 }
Errors: false
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论