英文:
MalformedInputException when reading file using Apache FileUtils
问题
FileUtils.readLines(attribFile, StandardCharsets.UTF_8)
这行代码出现错误:Method threw java.nio.charset.MalformedInputException
,可能的原因是文件中包含不受 UTF-8 支持的字符。尽管 HTTP GET 请求的 JSON 响应中看不到特殊字符,只有 , - . \n :
。你可以尝试以下方法来进一步调试:
尝试过的方法:
当使用 FileUtils.readLines(attribFile)
时没有错误,这似乎正常工作。尽管该函数已被弃用。
有效方法:
为了生成输入文件,你之前使用了 PrintWriter pw = new PrintWriter(attribFile)
,将其更改为 PrintWriter pw = new PrintWriter(new OutputStreamWriter(new FileOutputStream(attribFile), "UTF-8"))
,这样就能正常工作了。
不指定字符集编码似乎是一个 bug/代码质量问题,因为它允许创建对象而不指定字符集编码。这可能是为什么 FileUtils.readLines
被弃用而不指定字符集编码的原因。希望以上信息对你有所帮助。
英文:
FileUtils.readLines(attribFile, StandardCharsets.UTF_8)
This line is giving the error: Method threw java.nio.charset.MalformedInputException
What could be the reason for this?
Is it because the File could contain characters that are not supported by UTF_8?
But when I check the JSON response of the HTTP GET request from which the file is built I don't see any special characters. I just see , - . \n :
Any ideas on how do I debug this further?
Some Things I tried:
There is no error when I use FileUtils.readLines(attribFile)
. This seems to be working properly. Although the function is deprecated.
Thing that worked:
To generate the input file I was using PrintWriter pw = new PrintWriter(attribFile)
I changed this to PrintWriter pw = new PrintWriter(new OutputStreamWriter(new FileOutputStream(attribFile),"UTF-8")
and it worked fine.
But this really gave me a headache. I am used to coding like PrintWriter pw = new PrintWriter(attribFile)
without specifying the charset. This seems to be a bug/code quality issue on the methods that it allows to create an object without specifying the charset. Probably that is why FileUtils.readLines
is deprecated without specifying the charset.
Looking for thoughts on the above
答案1
得分: 1
这似乎是一个关于创建对象时不指定字符集的 bug/代码质量问题,简单的答案是你是对的,尽管这是一个长期存在且众所周知的问题。因此,任何代码审计工具都应该能够提醒您此问题。
此外,该问题已在Java 18中得到解决。从这个版本开始,所有未指定编码的方法都将默认使用UTF-8。
正如评论中提到的,您可以使用java.nio.file.Files
中的方法来读写文本,无需使用第三方库,并且在所有Java版本中,该类的方法在未指定字符编码时都使用UTF-8。
英文:
You wrote
> This seems to be a bug/code quality issue on the methods that it allows to create an object without specifying the charset.
and the simple answer is that you are right, though this is a long standing, well known issue. So any code audit tool should be able to warn you about this.
Further, the issue has been addressed in Java 18. Starting with this version, all methods without a specified encoding will consistently use UTF-8 by default.
As mentioned in the comments, you can use the methods in java.nio.file.Files
to read and write text without the need for a 3rd party library and the methods in this class use UTF-8 when no character encoding has been specified, in all Java versions.
答案2
得分: 0
你问题的原因很可能是你的文件不是以UTF-8编码,而是以其他字符集编码。但是,如果你能成功读取文件,这里有一个诊断工具可以帮助你。有一个实用程序可以将任何字符串转换为Unicode序列,反之亦然。如果你将文件的文本转换为Unicode,你可以看到用于编码文本的数值代码,从而可以找出使用的字符集。这个实用程序类名为StringUnicodeEncoderDecoder
,它是由我编写和维护的MgntUtils开源库的一部分。以下是一个使用示例:
result = "Hello World";
result = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(result);
System.out.println(result);
result = StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString(result);
System.out.println(result);
这段代码的输出是:
\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064
Hello World
你可以在Maven Central或Github找到该库。它作为Maven依赖包和带有源代码和Javadoc的形式提供。
这是该类的Javadoc链接:StringUnicodeEncoderDecoder
英文:
The cause of your problem is most likely that your file is not encoded in UTF-8 but in some other charset. However, if you can read your file successfully here is a diagnostic tool that can help you. There is a Utility that can convert any string to unicode sequence and vise-versa. If you convert the text of your file into inicodes you can see what numeric codes are used to encode your text and thus you may find out which charset is used the utility class is called StringUnicodeEncoderDecoder
is available as part of MgntUtils Open Source library written and maintained by me. Here is an example of usage:
result = "Hello World";
result = StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence(result);
System.out.println(result);
result = StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString(result);
System.out.println(result);
The output of this code is:
\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064
Hello World
The library can be found at Maven Central or at Github It comes as maven artifact and with sources and javadoc
Here is javadoc for the class StringUnicodeEncoderDecoder
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论