英文:
Manipulating BufferedReader before it is read by OpenCSV's CSVReaderBuilder results in CSVReaderBuilder = null
问题
这段代码涉及到使用OpenCSV库处理CSV文件的问题。最初的问题是,CSV文件中有一些行缺少列,因此CSVReaderBuilder返回null。后来,尝试在读取之前通过操作BufferedReader来添加额外的列,但仍然导致reader
为null。
原因是CSVReaderBuilder要求CSV文件的所有行具有相同数量的列,否则会引发com.opencsv.exceptions.CsvRequiredFieldEmptyException
异常。
有关如何解决此问题的建议是,首先确保CSV文件的所有行都具有相同数量的列。如果无法更改CSV文件本身,可以在读取CSV文件之前对其进行预处理以添加缺少的列。以下是可能的解决方案的代码:
FileInputStream is;
try {
is = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(is, charSet);
BufferedReader buffReader = new BufferedReader(isr);
List<String> modifiedLines = new ArrayList<>();
buffReader.lines().forEach(line -> {
String[] columns = line.split(";");
if (columns.length == 13) {
// Add an extra semicolon to the line to make it 14 columns
line = line + ";";
}
modifiedLines.add(line);
});
// Create a new BufferedReader with the modified lines
BufferedReader modifiedBuffReader = new BufferedReader(new StringReader(String.join("\n", modifiedLines)));
// Use your own CSVParser to set the separator
final CSVParser parser = new CSVParserBuilder()
.withSeparator(separator)
.build();
// Use your own CSVReader with your modified BufferedReader and CSVParser
reader = new CSVReaderBuilder(modifiedBuffReader)
.withCSVParser(parser)
.build();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
这样,您将在处理之前确保每行都有相同数量的列,从而避免CsvRequiredFieldEmptyException
异常,并且reader
不再为null。
英文:
I am reading a CSV file OpenCSV's CSVReaderBuilder which doesn't work as the CSV file for some weird reason I cannot change has some lines with a missing column.
So I thought it would be a good idea to manipulate the BufferedReader I use as input for the CSVReaderBuilder and add an extra column before it is read by CSVReaderBuilder but unfortunately the CSVReaderBuilder will always return null.
This code results in an com.opencsv.exceptions.CsvRequiredFieldEmptyException
as the lines have different number of columns, but works with a proper CSV file:
FileInputStream is;
try {
is = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(is, charSet);
BufferedReader buffReader = new BufferedReader(isr);
// use own CSVParser to set separator
final CSVParser parser = new CSVParserBuilder()
.withSeparator(separator)
.build();
// use own CSVReader make use of own CSVParser
reader = new CSVReaderBuilder(buffReader)
.withCSVParser(parser)
.build();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
So I added the code to manipulate the BufferedReader to add an extra semicolon if the column count is 13 instead of 14, but this will result in reader
being null.
FileInputStream is;
try {
is = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(is, charSet);
BufferedReader buffReader = new BufferedReader(isr);
buffReader.lines().forEach(t -> {
String a[] = t.split(";");
int occurence = a.length;
if(occurence == 13) {
t = t.concat(";");
}
});
// use own CSVParser to set separator
final CSVParser parser = new CSVParserBuilder()
.withSeparator(separator)
.build();
// use own CSVReader make use of own CSVParser
reader = new CSVReaderBuilder(buffReader)
.withCSVParser(parser)
.build();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Does anyone have an idea what I'm doing wrong here?
答案1
得分: 0
以下是翻译好的部分:
-
First, by the time
buffReader
is used innew CSVReaderBuilder(buffReader)
, it has already been fully consumed bybuffReader.lines().forEach
. ABufferedReader
can only be read once, in general. A solution could ordinarily be to create a newInputStreamReader
andBufferedReader
on the same file, except in this case, you'll run into the second problem.首先,在
new CSVReaderBuilder(buffReader)
中使用buffReader
时,它已经被buffReader.lines().forEach
完全使用完毕。通常情况下,BufferedReader
只能被读取一次。一种解决方法通常是在同一个文件上创建新的InputStreamReader
和BufferedReader
,但在这种情况下,会遇到第二个问题。 -
The line
t = t.concat(";");
does not work the way you expect. All this does is reassign the local variablet
, which isn't used again. It does not change the contents of the file or the contents of the reader.t = t.concat(";");
这一行的工作方式并不符合你的期望。它只是重新分配本地变量t
,并且不会再次使用。它不会更改文件的内容或读取器的内容。 -
How to fix this is less straightforward. As far as I know, this exception will only be thrown when binding the CSV data to a bean, and only if fields are marked as
required = true
. Given that the source data does not always contain data for the last field, it seems like it should not be marked as required.如何解决这个问题不太直接。据我所知,只有在将CSV数据绑定到bean时,且字段标记为
required = true
时,才会引发此异常。考虑到源数据并不总是包含最后一个字段的数据,似乎不应将其标记为必需。 -
If manipulating the source data really is your only option, I can think of a few possible approaches:
如果操作源数据确实是你唯一的选项,我可以想到一些可能的方法:
-
Write the modified data back to a temporary file and then read that file with the CSV parser.
将修改后的数据写回临时文件,然后使用CSV解析器读取该文件。
-
If the CSV file is small enough to fit into memory, you could write the modified data to a
StringWriter
, and then construct aStringReader
with the result, and parse that.如果CSV文件足够小,可以将修改后的数据写入
StringWriter
,然后使用结果构造一个StringReader
,并解析它。 -
Do the file content rewriting and CSV parsing in separate threads, using
PipedOutputStream
andPipedInputStream
to connect them.在单独的线程中进行文件内容重写和CSV解析,使用
PipedOutputStream
和PipedInputStream
连接它们。 -
Write a custom implementation of
FilterReader
that transforms the file contents as they are read (not the most straightforward to implement).编写一个自定义的
FilterReader
实现,以在读取时转换文件内容(这并不是最直接的实现方式)。
这些方法的详细实现细节对于本回答来说过于冗长和广泛,如果需要的话,我建议提出后续问题。
可能还有一些特定于OpenCSV库的选项,我可能不知道。
-
英文:
There are a couple of problems here:
First, by the time buffReader
is used in new CSVReaderBuilder(buffReader)
, it has already been fully consumed by buffReader.lines().forEach
. A BufferedReader
can only be read once, in general. A solution could ordinarily be to create a new InputStreamReader
and BufferedReader
on the same file, except in this case, you'll run into the second problem.
The line t = t.concat(";");
does not work the way you expect. All this does is reassign the local variable t
, which isn't used again. It does not change the contents of the file or the contents of the reader.
How to fix this is less straightforward. As far as I know, this exception will only be thrown when binding the CSV data to a bean, and only if fields are marked as required = true
. Given that the source data does not always contain data for the last field, it seems like it should not be marked as required.
If manipulating the source data really is your only option, I can think of a few possible approaches:
- Write the modified data back to a temporary file and then read that file with the CSV parser.
- If the CSV file is small enough to fit into memory, you could write the modified data to a
StringWriter
, and then construct aStringReader
with the result, and parse that. - Do the file content rewriting and CSV parsing in separate threads, using
PipedOutputStream
andPipedInputStream
to connect them. - Write a custom implementation of
FilterReader
that transforms the file contents as they are read (not the most straightforward to implement).
Details of implementing these approaches would too long and broad for this answer, so I would suggest creating follow up questions if needed.
There might be additional options specific to the OpenCSV library that I'm not aware of.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论