英文:
Files.readAllLines prepending unprintable characters
问题
以下是翻译好的部分:
我正在使用 Java(OpenJDK 8)将一个大型 JSON 文件读入字符串中。
我使用的代码是 final String fileContents = (Files.readAllLines(Paths.get(filePath.toString()))).stream().collect(Collectors.joining());
生成的字符串开头有一些不可打印的字符,这些字符在文件中不存在:
Eclipse 在实际文件 {"TIPL
等之前显示这些字符为 [-1,-2]。
在这里出了什么问题?我该怎么做才能让 Java 正确读取文件?
英文:
I am reading a large JSON file into a string using java (OpenJDK 8).
The code I am using is final String fileContents = (Files.readAllLines(Paths.get(filePath.toString()))).stream().collect(Collectors.joining());
The resulting String has some unprintable characters at the start of the string which aren't in the file:
Eclipse shows the characters as [-1,-2] before the {"TIPL
etc. which is the actual file.
What is wrong here? What can I do to get Java to read the file correctly?
答案1
得分: 4
你的文件以UTF16-LE(小端序)编码,并包含字节顺序标记(FF FE
)。
Files.readAllLines()
使用的默认编码是UTF-8,所以你在字符串数据中看到字节顺序标记(BOM)字符和NUL字符。
你应该将字符集作为第二个参数传递给Files.readAllLines()
方法:
Files.readAllLines(path, StandardCharsets.UTF_16);
StandardCharsets.UTF_16
编码将自动解释BOM,并相应地解析字符串内容。Charset
类的Javadoc中包含有关如何使用各种字符编码对字节顺序标记进行编码和解码的附加信息。
英文:
Your file is encoded as UTF16-LE (little-endian) and contains a byte-order mark (FF FE
).
The default encoding used by Files.readAllLines()
is UTF-8, so that's why you're seeing the byte-order mark (BOM) characters and NUL characters in your string data.
You should pass a character set as your second parameter to Files.readAllLines()
:
Files.readAllLines(path, StandardCharsets.UTF_16);
The StandardCharsets.UTF_16
encoding will automatically interpret the BOM, and parse your string content accordingly. The Javadoc for the Charset
class contains additional information on how byte-order marks are encoded and decoded using various character encodings.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论