Java:字节数组打印相同字符串时显示未知值

huangapple go评论65阅读模式
英文:

Java : Byte array prints unknown values for same string

问题

以下是翻译好的部分:

我有以下存储在文本文件中和作为Java变量的字符串:‘destructive’

我的代码如下:

public class SimpleTest {

    public static void main(String[] args) {
        try {
            File file = new File("TestFIle.txt");
            byte[] file_encoded = FileUtils.readFileToString(file, "UTF-8").getBytes("UTF-8");
            System.out.println(Arrays.toString(file_encoded));

            String toEncrypt = "‘destructive’";
            byte[] encoded = toEncrypt.getBytes(Charset.forName("UTF-8"));
            System.out.println(Arrays.toString(encoded));
        } catch (IOException ex) {
            Logger.getLogger(SimpleTest.class.getName()).log(Level.SEVERE, null, ex);
        }
    }
}

正如您所看到的

String toEncrypt = "‘destructive’";

TestFIle.txt 中的内容也是:‘destructive’

当我运行代码时,我得到:

[-17, -69, -65, -30, -128, -104, 100, 101, 115, 116, 114, 117, 99, 116, 105, 118, 101, -30, -128, -103]
[-30, -128, -104, 100, 101, 115, 116, 114, 117, 99, 116, 105, 118, 101, -30, -128, -103]

为什么在从文件中读取相同的文本时,字节数组开头会多出[-17, -69, -65],以及为什么会出现这种情况?

英文:

I have the following String which is stored in a text file and also as a variable in Java : ‘destructive’

My code below

public class SimpleTest {

    public static void main(String[] args) {
        try {
            File file = new File("TestFIle.txt");
            byte[] file_encoded = FileUtils.readFileToString(file, "UTF-8").getBytes("UTF-8");
            System.out.println(Arrays.toString(file_encoded));

            String toEncrypt = "‘destructive’";
            byte[] encoded = toEncrypt.getBytes(Charset.forName("UTF-8"));
            System.out.println(Arrays.toString(encoded));
        } catch (IOException ex) {
            Logger.getLogger(SimpleTest.class.getName()).log(Level.SEVERE, null, ex);
        }
    }
}

As you can see

String toEncrypt = "‘destructive’";

The contents in TestFIle.txt is also : ‘destructive’

When i run the code i get:

[-17, -69, -65, -30, -128, -104, 100, 101, 115, 116, 114, 117, 99, 116, 105, 118, 101, -30, -128, -103]
[-30, -128, -104, 100, 101, 115, 116, 114, 117, 99, 116, 105, 118, 101, -30, -128, -103]

What is the additional [-17, -69, -65] at the starting of byte array while reading the same text from a file and why do i get that?

答案1

得分: 1

你的文件似乎包含以UTF-8编码的文本,并带有前导字节顺序标记(BOM)。UTF-8的BOM为EF BB BF。在二进制补码表示中,这分别是-17、-69和-65。

英文:

Your file seems to contain text encoded in UTF-8 with a leading byte order mark (BOM). The BOM for UTF-8 is EF BB BF. In two's complement representation this is -17 -69 -65.

答案2

得分: 0

前导的 `[-17, -69, -65]` 是 UTF-8 的[字节顺序标记][1]。
在十六进制中,BOM 是 `[0xEF, 0xBB, 0xBF]`,实际上是 `[239, 187, 191]`。
但由于 Java 的 `byte` 是有符号的,这些数字被解释(并打印)为负数。

一般来说,BOM 是可选的,似乎在 Microsoft 生态系统中很常见:https://superuser.com/questions/1553666/utf-8-vs-utf-8-with-bom

  [1]: https://en.wikipedia.org/wiki/Byte_order_mark
英文:

The leading [-17, -69, -65] is the byte order mark of UTF-8.
In hexadecimal the BOM is [0xEF, 0xBB, 0xBF] which is actually [239, 187, 191].
But because Java's byte is signed, the numbers are interpreted (and printed) as negative numbers.

In general, the BOM is optional and it seems to be common in the Microsoft ecosystem: https://superuser.com/questions/1553666/utf-8-vs-utf-8-with-bom

huangapple
  • 本文由 发表于 2020年10月19日 19:21:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/64426269.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定