如何在Rust中切割一个包含UTF-8的字符串

huangapple go评论111阅读模式
英文:

How to slice a string has utf8 in rust

问题

I am writing a rust toy parser, and I want to handle UTF-8 char in my string input. I knew that I need to use chars method to get UTF-8 iterator for correctly get a UTF-8 char, but I want to slice string using UTF-8 index. Is there are any method I can use ? I take look into SWC, but i can understand how it handles UTF-8 string, because it seem input api need develop self to handle correct UFT-8 index.

use swc_common::input::{StringInput, Input};
use swc_common::BytePos;
fn main() {
    let utf8_str = "中文字串";
    let mut input =  StringInput::new("中文字串", BytePos(0), BytePos(utf8_str.len().try_into().unwrap()));
    println!("{:?}", input.slice(BytePos(0), BytePos(3)));
    println!("{:?}", &utf8_str[0..3]);
   // is there any function like slice(start_usize, end_usize) can get utf-8 string 
}
英文:

I am writing a rust toy parser, and I want to handle UTF-8 char in my string input. I knew that I need to use chars method to get UTF-8 iterator for correctly get a UTF-8 char, but I want to slice string using UTF-8 index. Is there are any method I can use ? I take look into SWC, but i can understand how it handles UTF-8 string, because it seem input api need develop self to handle correct UFT-8 index.

use swc_common::input::{StringInput, Input};
use swc_common::BytePos;
fn main() {
    let utf8_str = "中文字串";
    let mut input =  StringInput::new("中文字串", BytePos(0), BytePos(utf8_str.len().try_into().unwrap()));
    println!("{:?}", input.slice(BytePos(0), BytePos(3)));
    println!("{:?}", &utf8_str[0..3]);
   // is there any function like slice(start_usize, end_usize) can get utf-8 string 
}

答案1

得分: 1

使用字符索引进行切片不受支持,而且由于SliceIndex特性是密封的,您无法实现它。但是,您可以使用char_indices来计算每个UTF-8字符的相应字节索引:

fn main() {
    let utf8_str = "中文字串";
    let start_char = 1;
    let end_char = 2;
    let mut indices = utf8_str.char_indices().map(|(i, _)| i);
    let start = indices.nth(start_char).unwrap();
    let end = indices.nth(end_char - start_char - 1).unwrap_or(utf8_str.len());
    println!("{:?}", &utf8_str[start..end]);
}

输出:

"文"
英文:

Slicing with character indices is not supported and since the SliceIndex trait is sealed you can't implement it. But you can use char_indices to calculate the corresponding byte indices for every utf8 character:

fn main() {
    let utf8_str = "中文字串";
    let start_char = 1;
    let end_char = 2;
    let mut indices = utf8_str.char_indices().map(|(i, _)| i);
    let start = indices.nth(start_char).unwrap();
    let end = indices.nth(end_char - start_char - 1).unwrap_or(utf8_str.len());
    println!("{:?}", &utf8_str[start..end]);
}

output:

"文"

huangapple
  • 本文由 发表于 2023年7月24日 19:46:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76754171.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定