为什么JUnit无法处理包含非英文字符的文件?(使用NetBeans)

huangapple go评论65阅读模式
英文:

Why does JUnit not work with files containing non-English characters? (using NetBeans)

问题

我在NetBeans中制作了一个程序,它接受一个输入的.txt文件,然后将输出写入控制台。
它正常运行,但当我尝试使用JUnit进行测试时,程序会错误地读取文件。

例如,它会将'ö'错误地读取为'ö'

有没有办法解决JUnit不能正确读取非英文字符的问题呢?

英文:

I made a program in NetBeans that takes an input .txt file then writes an output to console.
It works fine but when I try to test it using JUnit, the program reads the file incorrectly.

For example, insteand of 'ö' it reads 'ö'

Is there any way to solve this problem of JUnit not reading non-English characters?

答案1

得分: 0

我怀疑问题实际上出现在你的程序或单元测试中,而不是在JUnit中。

如果证据如你所说,我预计你的代码做了类似这样的操作

Reader r = new FileReader(filename);

它打开文件并基于默认字符集设置字符解码器。

  • 当你在NetBeans中运行代码时,默认字符集为UTF-8,你可以正确读取文件(该文件采用UTF-8编码)。

  • 当你在JUnit测试环境中运行代码时,默认字符集似乎为LATIN-1,而它与输入文件的编码不匹配。

对于代码使用默认字符集来推断其输入文件的编码是可能是不正确的。或者,可能是因为你的JUnit测试不正确,因为它没有设置JVM默认字符集以匹配测试文件的编码。

以特定字符集(UTF-8)打开此文件的方法如下:

// Java 11
Reader r = new FileReader(filename, StandardCharsets.UTF_8);

// Java 8 及更早版本
Reader r = new InputStreamReader(new FileInputStream(filename), "UTF-8");

你无法更改正在运行的JVM的默认字符集。但是在启动运行JUnit测试的JVM时,你可以在JVM选项中覆盖平台默认字符集。(请参阅 https://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding。)

另外,也有可能你对证据有误解,实际上编码问题实际上出现在输出方面;也就是说,在你运行JUnit测试的上下文中,默认字符集与控制台的实际字符集不匹配...

英文:

I suspect that problem is actually in your program or unit tests, not in in JUnit.

If the evidence is as you say it is, then I expect that you code does something like this

Reader r = new FileReader(filename);

which opens the file and sets up a charset decoder based on the default charset.

  • When you are running the code in the NetBeans, the default charset is UTF-8, and you are reading the file (which is UTF-8 encoded) correctly.

  • When you are running it in the context of a JUnit test, the default charset is (apparently) LATIN-1 which doesn't match the encoding of the input file.

It is possiblly incorrect for your code to be using the default charset to infer the encoding of its input file. Alternatively, it could be that your JUnit test is incorrect because it is not setting the JVM default charset to match the test file.

The way to open this file with a specific charset (UTF-8) be:

// Java 11
Reader r = new FileReader(filename, StandardCharsets.UTF_8);

// Java 8 and earlier
Reader r = new InputStreamReader(new FileInputStream(filename), "UTF-8");

You can't change a running JVM's default charset. But you could possibly override the platform default charset in the JVM options when you start the JVM that runs the JUnit tests. (See https://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding.)


It is also possible that you have misinterpreted the evidence and the encoding problem is actually on the output side; i.e. there is a mismatch between the default charset and the console's actual charset ... in the context that you are running the JUnit tests.

huangapple
  • 本文由 发表于 2020年10月1日 21:33:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/64156501.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定