英文:
Read binary stream containing unsigned numbers
问题
我想读取包含32位无符号整数和8位无符号整数的二进制文件。我已经了解了 DataInputStream
,但它的 readInt
方法返回有符号整数,并且没有读取 无符号 整数的方法(针对16位和8位整数有这样的方法)。
逐个字节地读取并通过位拼接是这样做的官方方法吗?将字节读入 ByteArray
,然后使用位移和按位 or
运算符从中组合整数是否会显著降低性能?
英文:
I want to read binary file containing 32-bit unsigned integers and 8-bit unsigned integers. I already know DataInputStream
but its method readInt
returns signed integers and there is no method for reading unsigned ints (there are such methods for 16-bit and 8-bit integers).
Reading separate bytes and concatenating them bitwise is the “official” way to do it? Would reading bytes into ByteArray
and composing integers from them using bitshifts and bitwise or
s significantly decrease performance?
答案1
得分: 1
你可以使用以下代码:
long value = Integer.toUnsignedLong(dataInputStream.readInt());
这等同于 Java 8 之前的代码:
long value = dataInputStream.readInt() & 0xFFFFFFFFL;
关键点在于,有符号和无符号只是对位模式的不同解释,但是要读取四个字节的数量,readInt()
总是足够的。上面的操作将其转换为带符号的 long
,这是一种能够覆盖所有无符号 int 值的数据类型。
但是由于 int
已经包含了所有信息,所以没有必要立即将其转换为 long
。用于表示有符号数字的二进制补码甚至允许执行基本操作,如 +
、-
和 *
,而无需区分有符号和无符号数字。对于其他操作,Java 8 引入了以下方法,以将 int
值解释为无符号:
Integer.divideUnsigned(…)
Integer.remainderUnsigned(…)
Integer.compareUnsigned(…)
Integer.toUnsignedString(…)
我遇到的一个实际例子是解析类文件。这些文件在某些地方将大小编码为无符号 int,但是使用大多数标准 Java API 时,类文件被传递为字节数组或 ByteBuffer
实例,其大小不能超过 2^31 字节。因此,处理较大的数字对于无法正确处理的事物来说是一种不必要的复杂性,因为包含这种大尺寸规格的类文件必须被截断。
因此,处理这个问题的代码基本上如下所示:
int size = input.readInt();
if (Integer.compareUnsigned(size, Integer.MAX_VALUE) > 0) throw new IllegalArgumentException(
"truncated class file (attribute size " + Integer.toUnsignedString(size) + ')');
// just use the int value
或者在没有 Java 8 特性的情况下(甚至更简单,只要读者理解二进制补码):
int size = input.readInt();
if (size < 0) throw new IllegalArgumentException(
"truncated class file (attribute size " + (size & 0xFFFFFFFFL) + ')');
// just use the int value
英文:
You can use
long value = Integer.toUnsignedLong(dataInputStream.readInt());
This is equivalent to the pre-Java 8 code
long value = dataInputStream.readInt() & 0xFFFFFFFFL;
The key point is that signed or unsigned are just different interpretations of the bit pattern, but to read the four byte quantity, readInt()
is always sufficient. The operation above converts to a signed long
, a datatype capable of covering all values of unsigned int.
But since the int
does already hold all information, there is no need to convert it to a long
immediately. The Two’s Complement used to represent the signed numbers even allows performing basic operations, i.e. +
, -
, and *
, without differentiating between signed and unsigned numbers. For other operations, Java 8 introduced methods to perform them by interpreting the int
value as unsigned:
Integer.divideUnsigned(…)
Integer.remainderUnsigned(…)
Integer.compareUnsigned(…)
Integer.toUnsignedString(…)
A practical example, I encountered, is parsing class files. These files have sized encoded as unsigned int at some place, but with most standard Java APIs, class files are delivered as byte array or ByteBuffer
instances, which can not contain more than 2³¹ bytes. So dealing with larger numbers is an unnecessary complication for something that can’t be correct anyway, as a class file containing such a large size specification must be truncated.
So the code to handle this looks basically like:
int size = input.readInt();
if(Integer.compareUnsigned(size, Integer.MAX_VALUE)>0) throw new IllegalArgumentException(
"truncated class file (attribute size "+Integer.toUnsignedString(size)+')');
// just use the int value
or without Java 8 features
(even simpler, as long as the reader understands the Two’s Complement):
int size = input.readInt();
if(size < 0) throw new IllegalArgumentException(
"truncated class file (attribute size "+(size&0xFFFFFFFFL)+')');
// just use the int value
<sup>(see also this answer)</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论