英文:
How to slice a string has utf8 in rust
问题
I am writing a rust toy parser, and I want to handle UTF-8 char in my string input. I knew that I need to use chars
method to get UTF-8 iterator for correctly get a UTF-8 char, but I want to slice string using UTF-8 index. Is there are any method I can use ? I take look into SWC, but i can understand how it handles UTF-8 string, because it seem input api need develop self to handle correct UFT-8 index.
use swc_common::input::{StringInput, Input};
use swc_common::BytePos;
fn main() {
let utf8_str = "中文字串";
let mut input = StringInput::new("中文字串", BytePos(0), BytePos(utf8_str.len().try_into().unwrap()));
println!("{:?}", input.slice(BytePos(0), BytePos(3)));
println!("{:?}", &utf8_str[0..3]);
// is there any function like slice(start_usize, end_usize) can get utf-8 string
}
英文:
I am writing a rust toy parser, and I want to handle UTF-8 char in my string input. I knew that I need to use chars
method to get UTF-8 iterator for correctly get a UTF-8 char, but I want to slice string using UTF-8 index. Is there are any method I can use ? I take look into SWC, but i can understand how it handles UTF-8 string, because it seem input api need develop self to handle correct UFT-8 index.
use swc_common::input::{StringInput, Input};
use swc_common::BytePos;
fn main() {
let utf8_str = "中文字串";
let mut input = StringInput::new("中文字串", BytePos(0), BytePos(utf8_str.len().try_into().unwrap()));
println!("{:?}", input.slice(BytePos(0), BytePos(3)));
println!("{:?}", &utf8_str[0..3]);
// is there any function like slice(start_usize, end_usize) can get utf-8 string
}
答案1
得分: 1
使用字符索引进行切片不受支持,而且由于SliceIndex
特性是密封的,您无法实现它。但是,您可以使用char_indices
来计算每个UTF-8字符的相应字节索引:
fn main() {
let utf8_str = "中文字串";
let start_char = 1;
let end_char = 2;
let mut indices = utf8_str.char_indices().map(|(i, _)| i);
let start = indices.nth(start_char).unwrap();
let end = indices.nth(end_char - start_char - 1).unwrap_or(utf8_str.len());
println!("{:?}", &utf8_str[start..end]);
}
输出:
"文"
英文:
Slicing with character indices is not supported and since the SliceIndex
trait is sealed you can't implement it. But you can use char_indices
to calculate the corresponding byte indices for every utf8 character:
fn main() {
let utf8_str = "中文字串";
let start_char = 1;
let end_char = 2;
let mut indices = utf8_str.char_indices().map(|(i, _)| i);
let start = indices.nth(start_char).unwrap();
let end = indices.nth(end_char - start_char - 1).unwrap_or(utf8_str.len());
println!("{:?}", &utf8_str[start..end]);
}
output:
"文"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论