2020年7月29日 00:20:39go评论113阅读模式

英文:

Read binary stream containing unsigned numbers

问题

我想读取包含32位无符号整数和8位无符号整数的二进制文件。我已经了解了 DataInputStream，但它的 readInt 方法返回有符号整数，并且没有读取 无符号 整数的方法（针对16位和8位整数有这样的方法）。

逐个字节地读取并通过位拼接是这样做的官方方法吗？将字节读入 ByteArray，然后使用位移和按位 or 运算符从中组合整数是否会显著降低性能？

英文:

I want to read binary file containing 32-bit unsigned integers and 8-bit unsigned integers. I already know DataInputStream but its method readInt returns signed integers and there is no method for reading unsigned ints (there are such methods for 16-bit and 8-bit integers).

Reading separate bytes and concatenating them bitwise is the “official” way to do it? Would reading bytes into ByteArray and composing integers from them using bitshifts and bitwise ors significantly decrease performance?

答案1

得分: 1

你可以使用以下代码：

long value = Integer.toUnsignedLong(dataInputStream.readInt());

这等同于 Java 8 之前的代码：

long value = dataInputStream.readInt() & 0xFFFFFFFFL;

关键点在于，有符号和无符号只是对位模式的不同解释，但是要读取四个字节的数量，readInt() 总是足够的。上面的操作将其转换为带符号的 long，这是一种能够覆盖所有无符号 int 值的数据类型。

但是由于 int 已经包含了所有信息，所以没有必要立即将其转换为 long。用于表示有符号数字的二进制补码甚至允许执行基本操作，如 +、- 和 *，而无需区分有符号和无符号数字。对于其他操作，Java 8 引入了以下方法，以将 int 值解释为无符号：

我遇到的一个实际例子是解析类文件。这些文件在某些地方将大小编码为无符号 int，但是使用大多数标准 Java API 时，类文件被传递为字节数组或 ByteBuffer 实例，其大小不能超过 2^31 字节。因此，处理较大的数字对于无法正确处理的事物来说是一种不必要的复杂性，因为包含这种大尺寸规格的类文件必须被截断。

因此，处理这个问题的代码基本上如下所示：

int size = input.readInt();
if (Integer.compareUnsigned(size, Integer.MAX_VALUE) > 0) throw new IllegalArgumentException(
    "truncated class file (attribute size " + Integer.toUnsignedString(size) + ')');
// just use the int value

或者在没有 Java 8 特性的情况下（甚至更简单，只要读者理解二进制补码）：

int size = input.readInt();
if (size < 0) throw new IllegalArgumentException(
    "truncated class file (attribute size " + (size & 0xFFFFFFFFL) + ')');
// just use the int value

英文:

You can use

long value = Integer.toUnsignedLong(dataInputStream.readInt());

This is equivalent to the pre-Java 8 code

long value = dataInputStream.readInt() &amp; 0xFFFFFFFFL;

The key point is that signed or unsigned are just different interpretations of the bit pattern, but to read the four byte quantity, readInt() is always sufficient. The operation above converts to a signed long, a datatype capable of covering all values of unsigned int.

But since the int does already hold all information, there is no need to convert it to a long immediately. The Two’s Complement used to represent the signed numbers even allows performing basic operations, i.e. +, -, and *, without differentiating between signed and unsigned numbers. For other operations, Java 8 introduced methods to perform them by interpreting the int value as unsigned:

A practical example, I encountered, is parsing class files. These files have sized encoded as unsigned int at some place, but with most standard Java APIs, class files are delivered as byte array or ByteBuffer instances, which can not contain more than 2³¹ bytes. So dealing with larger numbers is an unnecessary complication for something that can’t be correct anyway, as a class file containing such a large size specification must be truncated.

So the code to handle this looks basically like:

int size = input.readInt();
if(Integer.compareUnsigned(size, Integer.MAX_VALUE)&gt;0) throw new IllegalArgumentException(
    &quot;truncated class file (attribute size &quot;+Integer.toUnsignedString(size)+&#39;)&#39;);
// just use the int value

or without Java 8 features
(even simpler, as long as the reader understands the Two’s Complement):

int size = input.readInt();
if(size &lt; 0) throw new IllegalArgumentException(
    &quot;truncated class file (attribute size &quot;+(size&amp;0xFFFFFFFFL)+&#39;)&#39;);
// just use the int value

<sup>(see also this answer)</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

阅读包含无符号数字的二进制流

问题

答案1

何时应该模拟（mock）或不模拟外部/依赖的公共接口。

在数组中，它存储的是NULL，而不是PDF文件名。

用Maven原型重命名Java类

运行时执行进程挂起

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。