阅读包含无符号数字的二进制流

huangapple go评论79阅读模式
英文:

Read binary stream containing unsigned numbers

问题

我想读取包含32位无符号整数和8位无符号整数的二进制文件。我已经了解了 DataInputStream,但它的 readInt 方法返回有符号整数,并且没有读取 无符号 整数的方法(针对16位和8位整数有这样的方法)。

逐个字节地读取并通过位拼接是这样做的官方方法吗?将字节读入 ByteArray,然后使用位移和按位 or 运算符从中组合整数是否会显著降低性能?

英文:

I want to read binary file containing 32-bit unsigned integers and 8-bit unsigned integers. I already know DataInputStream but its method readInt returns signed integers and there is no method for reading unsigned ints (there are such methods for 16-bit and 8-bit integers).

Reading separate bytes and concatenating them bitwise is the “official” way to do it? Would reading bytes into ByteArray and composing integers from them using bitshifts and bitwise ors significantly decrease performance?

答案1

得分: 1

你可以使用以下代码:

long value = Integer.toUnsignedLong(dataInputStream.readInt());

这等同于 Java 8 之前的代码:

long value = dataInputStream.readInt() & 0xFFFFFFFFL;

关键点在于,有符号和无符号只是对位模式的不同解释,但是要读取四个字节的数量,readInt() 总是足够的。上面的操作将其转换为带符号的 long,这是一种能够覆盖所有无符号 int 值的数据类型。

但是由于 int 已经包含了所有信息,所以没有必要立即将其转换为 long。用于表示有符号数字的二进制补码甚至允许执行基本操作,如 +-*,而无需区分有符号和无符号数字。对于其他操作,Java 8 引入了以下方法,以将 int 值解释为无符号:

我遇到的一个实际例子是解析类文件。这些文件在某些地方将大小编码为无符号 int,但是使用大多数标准 Java API 时,类文件被传递为字节数组或 ByteBuffer 实例,其大小不能超过 2^31 字节。因此,处理较大的数字对于无法正确处理的事物来说是一种不必要的复杂性,因为包含这种大尺寸规格的类文件必须被截断。

因此,处理这个问题的代码基本上如下所示:

int size = input.readInt();
if (Integer.compareUnsigned(size, Integer.MAX_VALUE) > 0) throw new IllegalArgumentException(
    "truncated class file (attribute size " + Integer.toUnsignedString(size) + ')');
// just use the int value

或者在没有 Java 8 特性的情况下(甚至更简单,只要读者理解二进制补码):

int size = input.readInt();
if (size < 0) throw new IllegalArgumentException(
    "truncated class file (attribute size " + (size & 0xFFFFFFFFL) + ')');
// just use the int value
英文:

You can use

long value = Integer.toUnsignedLong​(dataInputStream.readInt());

This is equivalent to the pre-Java 8 code

long value = dataInputStream.readInt() &amp; 0xFFFFFFFFL;

The key point is that signed or unsigned are just different interpretations of the bit pattern, but to read the four byte quantity, readInt() is always sufficient. The operation above converts to a signed long, a datatype capable of covering all values of unsigned int.

But since the int does already hold all information, there is no need to convert it to a long immediately. The Two’s Complement used to represent the signed numbers even allows performing basic operations, i.e. +, -, and *, without differentiating between signed and unsigned numbers. For other operations, Java 8 introduced methods to perform them by interpreting the int value as unsigned:

A practical example, I encountered, is parsing class files. These files have sized encoded as unsigned int at some place, but with most standard Java APIs, class files are delivered as byte array or ByteBuffer instances, which can not contain more than 2³¹ bytes. So dealing with larger numbers is an unnecessary complication for something that can’t be correct anyway, as a class file containing such a large size specification must be truncated.

So the code to handle this looks basically like:

int size = input.readInt();
if(Integer.compareUnsigned(size, Integer.MAX_VALUE)&gt;0) throw new IllegalArgumentException(
    &quot;truncated class file (attribute size &quot;+Integer.toUnsignedString(size)+&#39;)&#39;);
// just use the int value

or without Java 8 features
(even simpler, as long as the reader understands the Two’s Complement):

int size = input.readInt();
if(size &lt; 0) throw new IllegalArgumentException(
    &quot;truncated class file (attribute size &quot;+(size&amp;0xFFFFFFFFL)+&#39;)&#39;);
// just use the int value

<sup>(see also this answer)</sup>

huangapple
  • 本文由 发表于 2020年7月29日 00:20:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/63138575.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定