Concatenate byte arrays without allocating.

huangapple go评论52阅读模式
英文:

Concatenate byte arrays without allocating

问题

我想在字节切片末尾添加一个"标记"字节,以确保它以换行字节结尾,同时进行解析。以下是我认为它可能会看起来的样子:

fn parse(inp: &[u8]) {
    let workable_array = inp.basic_append(b'\n');
}

当访问超出原始输入长度的字节时,它将转移到添加的字节上。在此之后,我将在纯读取上下文中使用新数组。

我知道切片上的 concat 方法,但这会在内部分配一个全新的向量,这似乎是不必要的昂贵操作,尤其是当输入字符串可能非常大时。

英文:

I would like to append a "marker" byte onto a byte slice to ensure it ends in a newline byte while parsing it. Here's what I think it could look like:

fn parse(inp: &[u8]) {
    let workable_array = inp.basic_append(b'\n');
}

and when a byte is accessed beyond the length of the original input, it'll go onto the added byte. I'll be using the new array in a purely read-only context afterwards.

I'm aware of the concat method on slices, but this internally allocates an entirely new vector, which seems to be unnecessarily costly. Especially when the input string could potentially be very large.

答案1

得分: 2

If you are asking for an operation that takes a &[u8] and returns another &[u8] that includes the newline byte, without reallocation, then the answer is that this is impossible. [u8] is always contiguous memory, so the newline byte has to be physically located at the end of the slice, which can only be achieved through reallocation. Further, it isn't possible to do so with a [u8] slice, because it is not mutable.

However, you can achieve a similar effect with iterators. It will not allocate and simply provide another newline byte after all the other bytes got produced.

fn append_newline(data: impl Iterator<Item = u8>) -> impl Iterator<Item = u8> {
    data.chain(std::iter::once(b'\n'))
}

fn main() {
    let s = "Hello";
    let s_iter_with_newline = append_newline(s.bytes());
    for b in s_iter_with_newline {
        println!("{:?}", b as char);
    }
}
'H'
'e'
'l'
'l'
'o'
'\n'

Of course, this won't be compatible with functions that require a [u8] as a parameter.

Further remarks:

  • It sounds like your input sometimes contains a newline and sometimes not. It is much easier to cut away the newline from the ones that have it than to add it to the ones that don't have it. Maybe change your parsing algorithm so it requires the input without a newline?
  • You are using [u8] here for what sounds like string operations. Be aware that Rust characters are not u8 - Rust strings are instead a list of variably-sized UTF-8 characters. That's the reason why the &str/String types exist. Use those instead for string processing, as they can handle special characters, unlike [u8]. Of course, the '\n' problem you are having here still exists with str; it still isn't possible without reallocation.
英文:

If you are asking for an operation that takes a &amp;[u8] and returns another &amp;[u8] that includes the newline byte, without reallocation, then the answer is that this is impossible. &amp;[u8] is always contiguous memory, so the newline byte has to be physically located at the end of the slice, which can only be achieved through reallocation. Further, it isn't possible to do so with a &amp;[u8] slice, because it is not mutable.

However, you can in fact achieve a similar effect with iterators. It will not allocate and simply provide another newline byte after all the other bytes got produced.

fn append_newline(data: impl Iterator&lt;Item = u8&gt;) -&gt; impl Iterator&lt;Item = u8&gt; {
    data.chain(std::iter::once(b&#39;\n&#39;))
}

fn main() {
    let s = &quot;Hello&quot;;
    let s_iter_with_newline = append_newline(s.bytes());
    for b in s_iter_with_newline {
        println!(&quot;{:?}&quot;, b as char);
    }
}
&#39;H&#39;
&#39;e&#39;
&#39;l&#39;
&#39;l&#39;
&#39;o&#39;
&#39;\n&#39;

Of course this won't be compatible with functions that require a &amp;[u8] as parameter.


Further remarks:

  • It sounds like your input sometimes contains a newline and sometimes not. It is much easier to cut away the newline from the ones that have it than to add it to the ones that don't have it. Maybe change your parsing algorithm that so it requires the input without a newline?
  • You are using &amp;[u8] here for what sounds like string operations. Be aware that Rust characters are not u8 - Rust strings are are instead a list of variably-sized UTF-8 characters. That's the reason why the &amp;str/String types exist. Use those instead for string processing, as they can handle special characters, unlike &amp;[u8]. Of course the &#39;\n&#39; problem you are having here still exists with str, it still isn't possible without reallocation.

huangapple
  • 本文由 发表于 2023年5月14日 06:36:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76245132.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定