英文:
Java : Byte array prints unknown values for same string
问题
以下是翻译好的部分:
我有以下存储在文本文件中和作为Java变量的字符串:‘destructive’
我的代码如下:
public class SimpleTest {
public static void main(String[] args) {
try {
File file = new File("TestFIle.txt");
byte[] file_encoded = FileUtils.readFileToString(file, "UTF-8").getBytes("UTF-8");
System.out.println(Arrays.toString(file_encoded));
String toEncrypt = "‘destructive’";
byte[] encoded = toEncrypt.getBytes(Charset.forName("UTF-8"));
System.out.println(Arrays.toString(encoded));
} catch (IOException ex) {
Logger.getLogger(SimpleTest.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
正如您所看到的
String toEncrypt = "‘destructive’";
TestFIle.txt 中的内容也是:‘destructive’
当我运行代码时,我得到:
[-17, -69, -65, -30, -128, -104, 100, 101, 115, 116, 114, 117, 99, 116, 105, 118, 101, -30, -128, -103]
[-30, -128, -104, 100, 101, 115, 116, 114, 117, 99, 116, 105, 118, 101, -30, -128, -103]
为什么在从文件中读取相同的文本时,字节数组开头会多出[-17, -69, -65]
,以及为什么会出现这种情况?
英文:
I have the following String
which is stored in a text file and also as a variable in Java : ‘destructive’
My code below
public class SimpleTest {
public static void main(String[] args) {
try {
File file = new File("TestFIle.txt");
byte[] file_encoded = FileUtils.readFileToString(file, "UTF-8").getBytes("UTF-8");
System.out.println(Arrays.toString(file_encoded));
String toEncrypt = "‘destructive’";
byte[] encoded = toEncrypt.getBytes(Charset.forName("UTF-8"));
System.out.println(Arrays.toString(encoded));
} catch (IOException ex) {
Logger.getLogger(SimpleTest.class.getName()).log(Level.SEVERE, null, ex);
}
}
}
As you can see
String toEncrypt = "‘destructive’";
The contents in TestFIle.txt is also : ‘destructive’
When i run the code i get:
[-17, -69, -65, -30, -128, -104, 100, 101, 115, 116, 114, 117, 99, 116, 105, 118, 101, -30, -128, -103]
[-30, -128, -104, 100, 101, 115, 116, 114, 117, 99, 116, 105, 118, 101, -30, -128, -103]
What is the additional [-17, -69, -65]
at the starting of byte array while reading the same text from a file and why do i get that?
答案1
得分: 1
你的文件似乎包含以UTF-8编码的文本,并带有前导字节顺序标记(BOM)。UTF-8的BOM为EF BB BF。在二进制补码表示中,这分别是-17、-69和-65。
英文:
Your file seems to contain text encoded in UTF-8 with a leading byte order mark (BOM). The BOM for UTF-8 is EF BB BF. In two's complement representation this is -17 -69 -65.
答案2
得分: 0
前导的 `[-17, -69, -65]` 是 UTF-8 的[字节顺序标记][1]。
在十六进制中,BOM 是 `[0xEF, 0xBB, 0xBF]`,实际上是 `[239, 187, 191]`。
但由于 Java 的 `byte` 是有符号的,这些数字被解释(并打印)为负数。
一般来说,BOM 是可选的,似乎在 Microsoft 生态系统中很常见:https://superuser.com/questions/1553666/utf-8-vs-utf-8-with-bom
[1]: https://en.wikipedia.org/wiki/Byte_order_mark
英文:
The leading [-17, -69, -65]
is the byte order mark of UTF-8.
In hexadecimal the BOM is [0xEF, 0xBB, 0xBF]
which is actually [239, 187, 191]
.
But because Java's byte
is signed, the numbers are interpreted (and printed) as negative numbers.
In general, the BOM is optional and it seems to be common in the Microsoft ecosystem: https://superuser.com/questions/1553666/utf-8-vs-utf-8-with-bom
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论