Seeking explanation Long -> Byte Array -> String -> Byte Array -> Long

huangapple go评论111阅读模式
英文:

Seeking explanation Long -> Byte Array -> String -> Byte Array -> Long

问题

  1. 右侧值错误的原因可能与将64位长整型值转换为32位字符串值之间的转换有关。这是因为当一个64位的长整型值超出了32位字符串的表示范围时,数据会截断,导致错误的结果。在你的代码中,当l的值达到256时,字符串开始正确地表示这个值,因为它在32位范围内了。

  2. 不正确的值直到l的值达到256才改变,是因为在这个阈值之前,64位长整型值仍然在32位字符串表示范围内。当l的值超过256时,它将超出32位字符串的表示范围,导致截断和不正确的结果。在256之前,由于数值仍然在32位字符串的有效范围内,它们的表示是正确的。

英文:

I'm seeking an explanation for some oddity I've seen in someone elses code, they were retrieving an "int64" value from a third party library reading from an LDAP attribute, this library returned a byte array. To get the value they were trying something like

String s = new String(bytesFrom3rdParty);
BigInteger i = new BigInteger(s.getBytes());
System.out.println(i.toString());

With some long values this gave incorrect output that wasn't expected. To me there were two things that stood out

  1. Why go from byte array -> String -> Bytes -> BigInteger
  2. Why use a BigInteger for a 64 bit numeric value.

Anyway I did a little experiment

private static byte[] longToByteArray(Long l) {
	return ByteBuffer.allocate(Long.SIZE / Byte.SIZE).putLong(l).array();
}

private static Long byteArrayToLong(byte[] bytes) {
	return ByteBuffer.wrap(bytes).getLong();
}

public static void main(String[] args) {
	
	for (long l = 0L; l < 1000; l++) {
		byte[] origBytes = longToByteArray(l);
		String s = new String(origBytes);
		byte[] stringBytes = s.getBytes();
		Long origL = byteArrayToLong(origBytes);
		Long stringL = byteArrayToLong(stringBytes);
		System.out.println(origL.toString() + " " + stringL.toString());
	}
	
}

As I suspected skipping the conversion to string then back to a byte array fixed the issue, the output from the above is something like

124 124
125 125
126 126
127 127
128 239
129 239
130 239
131 239
132 239

And then the right hand value corrects itself again when it hits 256

254 239
255 239
256 256
257 257
258 258
259 259
260 260
261 261
262 262
263 263
264 264

So a couple of questions from me

  1. Why is the right hand value wrong? I assume it's something to do with conversion between a 64 bit long value to a 32 bit String value?
  2. Why doesn't the incorrect value change until the value of l gets to 256?

答案1

得分: 2

byte[] 可以是不同的东西,例如:

  • 序列化的字符串值(例如使用UTF-8编码) "123" -> 表示字符串的字节,实际上用2个字节编码每个字符
  • 序列化的长整型值 二进制表示的 123 -> 表示一个数字的8个字节

所以当将byte[]转换为String有意义的时候,是当你实际上获得了byte[]中的String,然后将String解析为数字(在你的情况下是BigInteger)。再次转换为字节对我来说没有太多意义。

String s = new String(bytesFrom3rdParty); // 从UTF-8字符串获取二进制
BigInteger i = new BigInteger(s); // 解析String "123" 为BigInteger
System.out.println(i.toString()); // 现在i将会是BigInteger中的123

这也可以工作:

String s = new String(bytesFrom3rdParty); // 从UTF-8字符串获取二进制
Long i = Long.parseLong(s); // 解析String "123" 为Long
System.out.println(i.toString()); // 现在i将会是Long中的123

在你的示例中,你正在做第二种情况,你正在将长整型序列化为二进制形式的byte[](而不是UTF-8字符串)。然后,你将该二进制数据转化为字符串。发生的情况是由于转换为Java的Charset的底层实现 - 它期望它是有效的Charset编码,它会改变你的二进制表示以适应Charset编码。

当你尝试从中检索并构建Long时会出现问题,为什么是128呢?可能是因为在127之前(旧的ASCII标准有这么多字符)你的二进制表示在某种程度上适应了Java的字符集编码,但之后就会出问题。

  • 序列化的字符串值 应该解析为 Long.parseFrom(String)new BigInteger(String)
  • 二进制序列化的数字 应该使用二进制读取 ByteBuffer.getLong()
英文:

byte[] can be different things, for example:

  • serialized String value (UTF-8 encoding for example) "123" -> bytes representing string, which actually encodes every character with 2 bytes
  • serialized Long value in binary 123 -> 8 bytes representing one number

So when it makes sense to convert byte[] to String is when you are actually getting String in byte[], and after that you are parsing the String into number (in your case BigInteger). Going back to bytes doesn't make much sense to me.

String s = new String(bytesFrom3rdParty); // binary from UTF-8 string
BigInteger i = new BigInteger(s); // parse String "123" to BigInteger
System.out.println(i.toString()); // now i will be 123 in BigInteger

This will work too:

String s = new String(bytesFrom3rdParty); // binary from UTF-8 string
Long i = Long.parseLong(s); // parse String "123" to Long
System.out.println(i.toString()); // now i will be 123 in Long

What you are doing in your example is second case, you are serializing Long in binary form to byte[] (not UTF-8 string). Then you are making a string of that binary data and getting bytes. What happens is due to conversion to Java's backing implementation of Charset - it expects it to be valid Charset encoding it changes your binary representation to something that fits Charset encoding.

When you try to retrive it back and build Long from it breaks, why 128. Probably up to 127 (old ASCII standard had this many characters) your binary representation somehow fits Java's charset encoding, but after it breaks.

  • serialized String value should be parsed Long.parseFrom(String) or new BigInteger(String)
  • binary Serialized number should be binary read ByteBuffer.getLong()

答案2

得分: 2

以下是翻译好的部分:

"Let's make it a little simpler, byte[] -> String -> byte[] is performing an encoding and a decode. When you use new String(byte[] b) it will:

构造一个新的字符串,通过使用平台的默认字符集解码指定的字节数组。

What happens if the character is not in your platform default character set?

当给定的字节在默认字符集中无效时,此构造函数的行为是未指定的。

So, in your situation, when an invalid byte is passed, it converts the character to 65533, the Java replacement character.

byte[] b = {-1};
System.out.println(Arrays.toString(new String(b).getBytes()));

[-17, -65, -67]

That is why the value doesn't change; they're all mapped to the replacement character.

You might use BigInteger simple access to a constructor that takes byte[] to create a long."

英文:

Lets make it a little simpler, byte[] -> String -> byte[] is performing an encoding and a decode. When you use new String(byte[] b) it will:

>Constructs a new String by decoding the specified array of bytes using the platform's default charset.

What happens if the character is not in your platform default character set?

>The behavior of this constructor when the given bytes are not valid in the default charset is unspecified.

So, in your situation, when an invalid byte is passed it converts the character to 65533 the java replacement character.

byte[] b = {-1};
System.out.println( Arrays.toString( new String(b).getBytes() ) );

> [-17, -65, -67]

That is why the value doesn't change, they're all mapped to the replacement character.

You might use BigInteger simple access to a constructor that takes byte[] to create a long.

huangapple
  • 本文由 发表于 2020年8月3日 16:07:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/63225829.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定