Background Information

This question relates to serde and the output produced by Serializing and Deserializing Rust structs.

  • My aim is to write a nested structure which serializes as a String, which is human readable.
  • The sub-component of the main structure should itself be serialized in a variable way.
  • If it is serialized using a human readable format, which serializes as a String, then the whole structure will be human readable.
  • If the sub-component of the main structure is serialized with a binary format, then only the parts of the main structure will be human readable. The sub-components itself will not.


Rust Strings have to be (should be) valid utf-8. (The rest of the standard library assumes that they are, so while using from_utf8_unchecked can create a String which isn't valid utf-8, this is probably not a good idea.)

Serializing as Json, or other formats

I am currently serializing and deserializing Rust structs in JSON format. This produces valid uft-8 formatted strings.

However, JSON is a temporary design choice, and it is likely the case that in the future this will be changed, most probably to Protobuf, or some other binary format such as Avro.

In this case, the output from serializing will no longer be valid utf-8. It is not a uft-8 string, but an arbitrary binary string.

What structure can be used to hold an arbitrary binary String? The most obvious choice would seem to be vec&lt;u8&gt;, however there are issues with this which I will describe below.

Another possible alternative might be OsString. However I am not sure if this is really the correct container to use. The name OsString doesn't quite imply the same level of generality as a name like BinaryString would, hence I am not totally clear about the intended purpose of this container, particularly given that vec&lt;u8&gt; exists.

Results of Serializing and Deserializing with Serde

Serde provides two choices for serializing and deserializing. These are:

  • to_string and from_str
  • to_vec and from_slice

The following demonstration code shows the output produced by both of these:

use serde::Serialize;
use serde::Deserialize;

use serde_json;

#[derive(Serialize, Deserialize)]
struct MessageBody {
    pub encoded_message_body: String,
    pub encoded_message_body_vec: Vec&lt;u8&gt;,

#[derive(Serialize, Deserialize)]
struct Message {
    pub message_body: MessageBody,

#[derive(Serialize, Deserialize)]
struct MyType {
    pub my_data: i32,

fn main() {

    let my_type = MyType {
        my_data: 10,
    // Encode as String using serde
    let encoded_my_type = serde_json::to_string(&amp;my_type).unwrap();
    let encoded_my_type_vec = serde_json::to_vec(&amp;my_type).unwrap();
    let message_body = MessageBody {
        encoded_message_body: encoded_my_type,
        encoded_message_body_vec: encoded_my_type_vec,
    let message = Message {
        message_body: message_body,
    //let encoded_message = serde_json::to_vec(&amp;message).unwrap();
    let encoded_message = serde_json::to_string(&amp;message).unwrap();
    println!(&quot;{:?}&quot;, encoded_message);

The output is shown below.


One output is human readable, the other is not. encoded_message_body is human readable. encoded_message_body_vec has serialized as a literal string representation of an array containing numerical values, which are themselves formatted as String. This is not human readable, despite the fact that the binary data in memory is a block of ASCII (utf-8) characters containing something which is formatted as a JSON string.

Please note: I have chosen to "display" the encoded message using println!. This data is actually being directed to Kafka. However, I could not write a MWE if Kafka is included. Exactly the same text is shown from a Kafka console consumer.

In line with the above information, I am aiming to produce something which looks more like this:



It appears that I am faced with a choice:

  • arbitrary serialization format (JSON, Proto, or other) by storing the result in a Vec&lt;u8&gt;, but this is not human readable
  • human readable output by storing the result in String, but this only supports encodings such as JSON which serialize to valid utf-8

Is it possible to have both human readable output, and arbitrary serialization format? Of course, for binary formats, the output will be a garbled string, and will not be human readable, but for formats such as JSON, it will maintain the human readability benefits.

Hopefully the question is clear? If not please ask further questions and I will try to clarify...

Further Reflections

Someone else (offline) raised the point that we can't dump raw binary into a JSON formatted thing, because that binary will likely contain data (characters) which have special meaning to JSON decoders. (eg: {...)

This is a slightly different way of thinking about the problem, and it is a very good point to have raised.

  • This means that if we want to have the outer message encoded with JSON (which we do) then the binary payload part must be encoded as something like base 64, or 85, etc...
  • Someone else pointed this out in one of the answers too.

This pushes me towards thinking that we might not actually gain anything by changing the "payload" to a binary encoding, even if the payload is large, because this would inevitably require encoding and decoding through an additional layer.

[struct] &lt;-&gt; BinarySerializedData &lt;-&gt; BaseXXEncoded (String, `encoded_message_body`) 
    &lt;-&gt; JSON Encoded String &lt;-&gt; [Wire]

To understand the performance impact some tests would need to be run. It might make sense to do this, or it might not, possibly depending5 on the average size of binary payload.


得分: 2









  • 一种是ASCII二进制格式,它可以直接输出ASCII数据,并管理非ASCII数据,例如bstr(我认为)或Vec<u8>的专用打印机,但这可能不是最好的主意,因为二进制数据可能包含被终端解释的序列,除非打印机考虑到这一点,否则会导致奇怪的效果。
  • 或者,使用二进制到文本编码(例如base32、base64、base85、uuencode等)将二进制数据嵌入普通字符串中。这通常需要一些帧,但是这是一种计算机起源以来的常见方法。

> from_utf8_unchecked can create a String which isn't valid utf-8, this is probably not a good idea

it's straight up immediate UB.

> What structure can be used to hold an arbitrary binary String? The most obvious choice would seem to be vec<u8>, however there are issues with this which I will describe below.

There is bstr.

> Another possible alternative might be OsString. However I am not sure if this is really the correct container to use.

Absolutely not.

> Is it possible to have both human readable output, and arbitrary serialization format? Of course, for binary formats, the output will be a garbled string, and will not be human readable, but for formats such as JSON, it will maintain the human readability benefits.

There are basically two ways:

  • an ascii-binary format, which can output the ASCII data directly and will manage the non-ascii e.g. bstr (I think) or a bespoke printer for Vec&lt;u8&gt;, however this may not be the best idea as the binary data may contain sequences which are interpreted by e.g. the terminal, leading to odd effects unless the printer takes that an account
  • alternatively, smuggle the binary data into a normal string using a binary-to-text encoding e.g. base32, base64, base85, uuencode, ... this generally requires some framing but is a common method dating back to the origins of computing


得分: 1

I could fine nothing that supported that use-case in serde or other popular crates, but we can build our own!


A possible strategy is to use a wrapper around Vec&lt;u8&gt;, and serialize it like an array of strings and numbers, for each Unicode and non-Unicode chunk in the data. I will use the bstr crate for that, because it is suited for handling partially-UTF-8 data.

一种可能的策略是使用一个包装器来处理 Vec&lt;u8&gt;,并将其序列化为字符串和数字数组,用于处理数据中的每个Unicode和非Unicode块。我将使用 bstr 包,因为它适用于处理部分UTF-8数据。

use bstr::{BString, ByteSlice};
use serde::ser::{Serialize, SerializeSeq, Serializer};

pub struct HumanReadableBStr(BString);

impl Serialize for HumanReadableBStr {
    fn serialize&lt;S&gt;(&amp;self, serializer: S) -&gt; Result&lt;S::Ok, S::Error&gt;
        S: Serializer,
        let mut seq = serializer.serialize_seq(None)?;
        for chunk in self.0.utf8_chunks() {
            if !chunk.valid().is_empty() {
            if !chunk.invalid().is_empty() {
使用 [`bstr`]( 包来处理部分UTF-8数据的一种可能策略是使用一个围绕 `Vec&lt;u8&gt;` 的包装器,并将其序列化为字符串和数字的数组,以处理数据中的每个Unicode和非Unicode块。

I could fine nothing that supported that use-case in serde or other popular crates, but we can build our own!

A possible strategy is to use a wrapper around Vec&lt;u8&gt;, and serialize it like an array of strings and numbers, for each Unicode and non-Unicode chunk in the data. I will use the bstr crate for that, because it is suited for handling partially-UTF-8 data.

use bstr::{BString, ByteSlice};
use serde::ser::{Serialize, SerializeSeq, Serializer};

pub struct HumanReadableBStr(BString);

impl Serialize for HumanReadableBStr {
    fn serialize&lt;S&gt;(&amp;self, serializer: S) -&gt; Result&lt;S::Ok, S::Error&gt;
        S: Serializer,
        let mut seq = serializer.serialize_seq(None)?;
        for chunk in self.0.utf8_chunks() {
            if !chunk.valid().is_empty() {
            if !chunk.invalid().is_empty() {

