从ByteBuffer读取以NUL结尾的字符串

huangapple go评论81阅读模式
英文:

Read NUL-terminated String from ByteBuffer

问题

ByteBuffer b = /* 61 62 63 64 00 31 32 34 00 (hex) */;
int startPosition = b.position();
int nullTerminatorIndex = -1;

while (b.hasRemaining()) {
    byte currentByte = b.get();
    if (currentByte == 0) {
        nullTerminatorIndex = b.position() - 1;
        break;
    }
}

if (nullTerminatorIndex != -1) {
    byte[] stringBytes = new byte[nullTerminatorIndex - startPosition];
    b.position(startPosition);
    b.get(stringBytes);
    b.get(); // Move past the null terminator

    String s0 = new String(stringBytes, StandardCharsets.UTF_8);

    nullTerminatorIndex = -1;
    startPosition = b.position();

    while (b.hasRemaining()) {
        byte currentByte = b.get();
        if (currentByte == 0) {
            nullTerminatorIndex = b.position() - 1;
            break;
        }
    }

    if (nullTerminatorIndex != -1) {
        stringBytes = new byte[nullTerminatorIndex - startPosition];
        b.position(startPosition);
        b.get(stringBytes);

        String s1 = new String(stringBytes, StandardCharsets.UTF_8);
    }
}

In this code, we iterate through the ByteBuffer starting from the current position until we find a null terminator (byte value 0), which indicates the end of the UTF-8 string. Once we find the null terminator, we create a byte array containing the bytes of the string, and then create a String using the StandardCharsets.UTF_8 encoding.

Please note that this code assumes that the byte buffer contains valid UTF-8 encoded strings terminated by null bytes. Also, error handling and proper resource management are important considerations in real-world scenarios.

英文:

How can I read NUL-terminated UTF-8 string from Java ByteBuffer starting at ByteBuffer#position()?

ByteBuffer b = /* 61 62 63 64 00 31 32 34 00 (hex) */;
String s0 = /* read first string */;
String s1 = /* read second string */;

// `s0` will now contain “ABCD” and `s1` will contain “124”.

I have already tried using Charsets.UTF_8.decode(b) but it seems this function is ignoring current ByteBuffer postision and reads until the end of the buffer.

Is there more idiomatic way to read such string from byte buffer than seeking for byte containing 0 and the limiting the buffer to it (or copying the part with string into separate buffer)?

答案1

得分: 6

以下是翻译好的内容:

习惯用法意思:"一行代码",不过我并不知道(并不令人惊讶,因为NUL结尾的字符串并不在Java规范中)。

我想到的第一件事是使用b.slice().limit(x)来仅创建一个轻量级的视图,包含所需的字节(比将它们复制到任何地方都要好,因为您可以直接在缓冲区中进行操作)

ByteBuffer b = ByteBuffer.wrap(new byte[] {0x61, 0x62, 0x63, 0x64, 0x00, 0x31, 0x32, 0x34, 0x00 });
int i;
while (b.hasRemaining()) {
  ByteBuffer nextString = b.slice(); // 与b具有相同起始位置的视图
  for (i = 0; b.hasRemaining() && b.get() != 0x00; i++) {
    // 计算到下一个NUL
  }
  nextString.limit(i); // 视图现在在NUL之前停止
  CharBuffer s = StandardCharsets.UTF_8.decode(nextString);
  System.out.println(s);
}
英文:

Idiomatic meaning "one liner" not that I know of (unsurprising since NUL-terminated strings are not part of the Java spec).

The first thing I came up with is using b.slice().limit(x) to create a lightweight view onto the desired bytes only (better than copying them anywhere as you might be able to work directly with the buffer)

ByteBuffer b = ByteBuffer.wrap(new byte[] {0x61, 0x62, 0x63, 0x64, 0x00, 0x31, 0x32, 0x34, 0x00 });
int i;
while (b.hasRemaining()) {
  ByteBuffer nextString = b.slice(); // View on b with same start position
  for (i = 0; b.hasRemaining() && b.get() != 0x00; i++) {
    // Count to next NUL
  }
  nextString.limit(i); // view now stops before NUL
  CharBuffer s = StandardCharsets.UTF_8.decode(nextString);
  System.out.println(s);
}

答案2

得分: 1

在Java中,字符\u0000,UTF-8字节0,Unicode代码点U+0都是正常的字符。因此,读取所有内容(也许读入一个过大的字节数组),然后执行以下操作:

String s = new String(bytes, StandardCharsets.UTF_8);

String[] s0s1 = s.split("\u0000");
String s0 = s0s1[0];
String s1 = s0s1[1];

如果你没有固定的位置,必须逐字节顺序读取,代码会变得很丑陋。事实上,C语言的其中一位创始人称空终止字符串为历史性错误。

相反地,为了不为Java字符串生成UTF-8字节0,通常用于进一步处理成C/C++的空终止字符串,存在一种编写修改后的UTF-8的方法,也会对0字节进行编码。

英文:

In java the char \u0000, the UTF-8 byte 0, the Unicode code point U+0 is a normal char. So read all (maybe into an overlarge byte array), and do

String s = new String(bytes, StandardCharsets.UTF_8);

String[] s0s1 = s.split("\u0000");
String s0 = s0s1[0];
String s1 = s0s1[1];

If you do not have fixed positions and must sequentially read every byte the code is ugly. One of the C founders indeed called the nul terminated string a historic mistake.

The reverse, to not produce a UTF-8 byte 0 for a java String, normally for further processing as C/C++ nul terminated strings, there exists writing a modified UTF-8, also encoding the 0 byte.

答案3

得分: 0

import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Jtest {
    public static void main(String[] args) {
        ByteBuffer b = ByteBuffer.allocate(10);

        b.put((byte)0x61);
        b.put((byte)0x62);
        b.put((byte)0x63);
        b.put((byte)0x64);
        b.put((byte)0x00);
        b.put((byte)0x31);
        b.put((byte)0x32);
        b.put((byte)0x34);
        b.put((byte)0x00);
        b.rewind();

        String s0;
        String s1;

        System.out.println("Original ByteBuffer: " + Arrays.toString(b.array()));

        String s = StandardCharsets.UTF_8.decode(b).toString();
        int nullIndex = s.indexOf('
import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

public class Jtest {
    public static void main(String[] args) {
        ByteBuffer b = ByteBuffer.allocate(10);

        b.put((byte)0x61);
        b.put((byte)0x62);
        b.put((byte)0x63);
        b.put((byte)0x64);
        b.put((byte)0x00);
        b.put((byte)0x31);
        b.put((byte)0x32);
        b.put((byte)0x34);
        b.put((byte)0x00);
        b.rewind();

        String s0;
        String s1;

        System.out.println("Original ByteBuffer: " + Arrays.toString(b.array()));

        String s = StandardCharsets.UTF_8.decode(b).toString();
        int nullIndex = s.indexOf('\0');
        String s0 = s.substring(0, nullIndex);
        String s1 = s.substring(nullIndex + 1);

        String[] words = { s0, s1 };
        for (int i = 0; i < words.length; i++) {
            System.out.println(" Word " + i + " = " + words[i]);
        }
    }
}
'
);
String s0 = s.substring(0, nullIndex); String s1 = s.substring(nullIndex + 1); String[] words = { s0, s1 }; for (int i = 0; i < words.length; i++) { System.out.println(" Word " + i + " = " + words[i]); } } }
英文:

You can do it by replace and split functions. Convert your hex bytes to String and find 0 by a custom character. Then split your string with that custom character.

import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

/**
 * Created by Administrator on 8/25/2020.
 */
public class Jtest {
    public static void main(String[] args) {
        //ByteBuffer b = /* 61 62 63 64 00 31 32 34 00 (hex) */;
        ByteBuffer b = ByteBuffer.allocate(10);

        b.put((byte)0x61);
        b.put((byte)0x62);
        b.put((byte)0x63);
        b.put((byte)0x64);
        b.put((byte)0x00);
        b.put((byte)0x31);
        b.put((byte)0x32);
        b.put((byte)0x34);
        b.put((byte)0x00);
        b.rewind();

        String s0;
        String s1;

        // print the ByteBuffer
        System.out.println(&quot;Original ByteBuffer:  &quot;
                + Arrays.toString(b.array()));

        // `s0` will now contain “ABCD” and `s1` will contain “124”.
        String s = StandardCharsets.UTF_8.decode(b).toString();
        String ss = s.replace((char)0,&#39;;&#39;);
        String[] words = ss.split(&quot;;&quot;);
        for(int i=0; i &lt; words.length; i++) {
            System.out.println(&quot; Word &quot; + i + &quot; = &quot; +words[i]);
        }

    }
}

I believe you can do it more efficiently with removing replace.

huangapple
  • 本文由 发表于 2020年8月25日 18:58:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/63577406.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定