`Files.readAllLines` 在不可打印字符前添加了内容。

huangapple go评论70阅读模式
英文:

Files.readAllLines prepending unprintable characters

问题

以下是翻译好的部分:

我正在使用 Java(OpenJDK 8)将一个大型 JSON 文件读入字符串中。

我使用的代码是 final String fileContents = (Files.readAllLines(Paths.get(filePath.toString()))).stream().collect(Collectors.joining());

生成的字符串开头有一些不可打印的字符,这些字符在文件中不存在:

Eclipse 在实际文件 {"TIPL 等之前显示这些字符为 [-1,-2]。

在这里出了什么问题?我该怎么做才能让 Java 正确读取文件?

英文:

I am reading a large JSON file into a string using java (OpenJDK 8).

The code I am using is final String fileContents = (Files.readAllLines(Paths.get(filePath.toString()))).stream().collect(Collectors.joining());

The resulting String has some unprintable characters at the start of the string which aren't in the file:

`Files.readAllLines` 在不可打印字符前添加了内容。

Eclipse shows the characters as [-1,-2] before the {"TIPL etc. which is the actual file.

What is wrong here? What can I do to get Java to read the file correctly?

答案1

得分: 4

你的文件以UTF16-LE(小端序)编码,并包含字节顺序标记FF FE)。

Files.readAllLines()使用的默认编码是UTF-8,所以你在字符串数据中看到字节顺序标记(BOM)字符和NUL字符。

你应该将字符集作为第二个参数传递给Files.readAllLines()方法:

Files.readAllLines(path, StandardCharsets.UTF_16);

StandardCharsets.UTF_16编码将自动解释BOM,并相应地解析字符串内容。Charset类的Javadoc中包含有关如何使用各种字符编码对字节顺序标记进行编码和解码的附加信息。

英文:

Your file is encoded as UTF16-LE (little-endian) and contains a byte-order mark (FF FE).

The default encoding used by Files.readAllLines() is UTF-8, so that's why you're seeing the byte-order mark (BOM) characters and NUL characters in your string data.

You should pass a character set as your second parameter to Files.readAllLines():

Files.readAllLines(path, StandardCharsets.UTF_16);

The StandardCharsets.UTF_16 encoding will automatically interpret the BOM, and parse your string content accordingly. The Javadoc for the Charset class contains additional information on how byte-order marks are encoded and decoded using various character encodings.

huangapple
  • 本文由 发表于 2020年9月23日 17:36:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/64025072.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定