Like the title says I'm not able to read the contents of a file (csv file) while running the same code on a linux container

private Set&lt;VehicleConfiguration&gt; loadConfigurations(Path file, CodeType codeType) throws IOException {
log.debug(&quot;File exists? &quot; + Files.exists(file));
log.debug(&quot;Path &quot; + file.toString());
log.debug(&quot;File &quot; + file.toFile().toString());
log.debug(&quot;File absolute path &quot; + file.toAbsolutePath().toString());
String line;
Set&lt;VehicleConfiguration&gt; configurations = new HashSet&lt;&gt;(); // this way we ignore duplicates in the same file
try(BufferedReader br = new BufferedReader(new FileReader(file.toFile()))){
while ((line = br.readLine()) != null)   {
configurations.add(build(line, codeType));
log.debug(&quot;Loaded &quot; + configurations.size() + &quot; configurations&quot;);
return configurations;

The logs return "true" and the path for the file in both systems (locally on windows and on a linux docker container). On windows it loads "15185 configurations" but on the container it loads "0 configurations".

The file exists on linux, I use bash and check it myself. I use the head command and the file has lines.

Before this I tried with Files.lines like so:

var vehicleConfigurations = Files.lines(file)
.map(line -&gt; build(line, codeType))

But this has a problem (on container only) regarding the contents. It reads the file but not the whole file, it reaches a given line (say line 8000) and does not read it completely (reads about half a line before the comma separator). Then I get a java.lang.ArrayIndexOutOfBoundsException because my build method tries to split then line and I access index 1 (which it doesn't have, only 0):

private VehicleConfiguration build(String line, CodeType codeType) {
String[] cells = line.split(lineSeparator);
var vc = new VehicleConfiguration();
return vc;

What could be the issue? I don't understand how the same code (in Java) works on Windows but not on a Linux container. It makes no sense.

I'm using Java 11. The file is copied using volumes in a docker-compose file like this:

- ./file-sources:/file-sources

I then copy the file (using cp command on the linux container) from file-sources to /root because that's where the app is listening for new files to arrive. File contents are then read with the methods I described. Example file data (does not have weird characters):


Thanks in advance.

UPDATE: Tried with newBufferedReader method, same result (works on windows, doesn't work on linux container):

  private Set&lt;VehicleConfiguration&gt; loadConfigurations(Path file, CodeType codeType) throws IOException {
String line;
Set&lt;VehicleConfiguration&gt; configurations = new HashSet&lt;&gt;(); // this way we ignore duplicates in the same file
try(BufferedReader br = Files.newBufferedReader(file)){
while ((line = br.readLine()) != null)   {
configurations.add(build(line, codeType));
log.debug(&quot;Loaded &quot; + configurations.size() + &quot; configurations&quot;);
return configurations;

wc -l in the linux container (in /root) returns: 15185 hard_001.csv

Update: This is no solution but I found out that by dropping the files directly on the file-sources folder and make that folder the folder that the code listens to, the files are read. So basically, it seems the problem is more apparent with using cp/mv inside the container to another folder. Maybe the file is read before it is fully copied/moved and that's why it reads 0 configurations?


得分: 4


有一些 Java 中的方法你永远不应该使用。

new FileReader(File) 是其中之一。



FileReader 已经修复(有一个接受字符集的第二个构造函数),但这只是自 JDK11 起才有的。你已经有了很好的新 API,为什么要回到笨拙的旧 File API?不要这样做。

所有 Files 中的各种方法,如 Files.newBufferedReader,如果你不指定的话,默认使用 UTF-8(从这个角度看,Files 更有用,不像大多数其他 Java 核心库)。因此:

try (BufferedReader br = Files.newBufferedReader(file)) {


现在,它可能仍然会失败。但这是好事!它也会在你的开发机器上失败。最有可能的是,你正在读取的文件实际上不是 UTF-8 编码。这是一个合理的猜测;大多数 Linux 部署都带有 UTF-8 默认字符集,而大多数开发机器则没有;如果你的开发机器正常工作而你的部署环境不正常,那么很明显的结论是你的输入文件不是 UTF-8 编码。它不需要与你的开发机器的默认设置相同;像 ISO_8859_1 这样的设置永远不会引发异常,但它将读取无意义的内容。你的代码可能似乎工作(没有崩溃),但你读取的文本仍然不正确。

找出你得到的文本编码,然后指定它。例如,如果是 ISO_8859_1:

try (BufferedReader br = Files.newBufferedReader(file, StandardCharsets.ISO_8859_1)) {


检查失败的那一行,在十六进制编辑器中如果必要的话。我敢打赌,美元换成甜甜圈,那里会有一个字节大于或等于 0x80(十进制为 128 或更高)。从 ASCII 到任何 ISO-8859 变种到 UTF-8 Windows Cp1252 到 macroman 到其他很多东西,从 0x80 或更高的字节开始,它们都不同。拥有那个字节以及一些关于它应该是什么字符的知识通常是弄清楚文本文件所使用的编码的好方法的一种开始。

注意:如果这不是问题的原因,请检查文本文件是如何从开发机器复制到部署环境的。你确定它是同一个文件吗?如果它通过文本机制复制,字符集编码再次可能是问题,但这次是在文件如何编写而不是你的 Java 应用程序如何读取的方面。


There are a few methods in java you should never use. ever.

new FileReader(File) is one of them.

Any time that you have a thing that represents bytes and somehow chars or Strings fall out, or vice versa? Don't ever use those, unless the spec of said method explicitly points out that it always uses a pre-set charset. Almost all such methods use the 'system default charset' which means that the operation depends on the machine you run it on. That is shorthand for 'this will fail, and your tests won't catch it'. Which you don't want.

Which is why you should never use these things.

FileReader has been fixed (there is a second constructor that takes a charset), but that's only since JDK11. You already have the nice new API, why do you switch back to the dinky old File API? Don't do that.

All the various methods in Files, such as Files.newBufferedReader, are specced to do UTF-8 if you don't specify (in that way, Files is more useful, and unlike most other java core libraries). Thus:

try (BufferedReader br = Files.newBufferedReader(file)) {

which is just.. better.. than your line.

Now, it'll probably still fail on you. But that's good! It'll also fail on your dev machine. Most likely, the file you are reading is not, in fact, in UTF_8. This is the likely guess; most linuxen are deployed with a UTF_8 default charset, and most dev machines are not; if your dev machine is working and your deployment environment isn't, the obvious conclusion is that your input file is not UTF_8. It does not need to be what your dev machine has a default either; something like ISO_8859_1 will never throw exceptions, but it will read gobbledygook instead. Your code may seem to work (no crashes), but the text you read is still incorrect.

Figure out what text encoding you got, and then specify it. If it's ISO_8859_1, for example:

try (BufferedReader br = Files.newBufferedReader(file, StandardCharsets.ISO_8859_1)) {

and now your code no longer has the 'works on some machines but not on others' nature.

Inspect the line where it fails, in a hex editor if you have to. I bet you dollars to donuts there will be a byte there which is 0x80 or higher (in decimal, 128 or higher). Everything up to and including 127 tends to mean the exact same thing in a wide variety of text encodings, from ASCII to any ISO-8859 variant to UTF-8 Windows Cp1252 to macroman to so many other things, so as long as it's all just plain letters and digits, having the wrong encoding is not going to make any difference. But once you get to 0x80 or higher they're all different. Armed with that byte + some knowledge of what character it is supposed to be is usually a good start in figuring out what encoding that text file is in.

NB: If this isn't it, check how the text file is being copied from your dev machine to your deployment environment. Are you sure it is the same file? If it's being copied through a textual mechanism, charset encoding again can be to blame, but this time in how the file is written, instead of how your java app reads it.

