Manipulating BufferedReader before it is read by OpenCSV's CSVReaderBuilder results in CSVReaderBuilder = null

huangapple go评论55阅读模式
英文:

Manipulating BufferedReader before it is read by OpenCSV's CSVReaderBuilder results in CSVReaderBuilder = null

问题

这段代码涉及到使用OpenCSV库处理CSV文件的问题。最初的问题是,CSV文件中有一些行缺少列,因此CSVReaderBuilder返回null。后来,尝试在读取之前通过操作BufferedReader来添加额外的列,但仍然导致reader为null。

原因是CSVReaderBuilder要求CSV文件的所有行具有相同数量的列,否则会引发com.opencsv.exceptions.CsvRequiredFieldEmptyException异常。

有关如何解决此问题的建议是,首先确保CSV文件的所有行都具有相同数量的列。如果无法更改CSV文件本身,可以在读取CSV文件之前对其进行预处理以添加缺少的列。以下是可能的解决方案的代码:

FileInputStream is;
try {
    is = new FileInputStream(fileName);
    InputStreamReader isr = new InputStreamReader(is, charSet);
    BufferedReader buffReader = new BufferedReader(isr);
    List<String> modifiedLines = new ArrayList<>();

    buffReader.lines().forEach(line -> {
        String[] columns = line.split(";");
        if (columns.length == 13) {
            // Add an extra semicolon to the line to make it 14 columns
            line = line + ";";
        }
        modifiedLines.add(line);
    });

    // Create a new BufferedReader with the modified lines
    BufferedReader modifiedBuffReader = new BufferedReader(new StringReader(String.join("\n", modifiedLines)));

    // Use your own CSVParser to set the separator
    final CSVParser parser = new CSVParserBuilder()
            .withSeparator(separator)
            .build();

    // Use your own CSVReader with your modified BufferedReader and CSVParser
    reader = new CSVReaderBuilder(modifiedBuffReader)
            .withCSVParser(parser)
            .build();
} catch (FileNotFoundException e) {
    e.printStackTrace();
}

这样,您将在处理之前确保每行都有相同数量的列,从而避免CsvRequiredFieldEmptyException异常,并且reader不再为null。

英文:

I am reading a CSV file OpenCSV's CSVReaderBuilder which doesn't work as the CSV file for some weird reason I cannot change has some lines with a missing column.

So I thought it would be a good idea to manipulate the BufferedReader I use as input for the CSVReaderBuilder and add an extra column before it is read by CSVReaderBuilder but unfortunately the CSVReaderBuilder will always return null.

This code results in an com.opencsv.exceptions.CsvRequiredFieldEmptyException as the lines have different number of columns, but works with a proper CSV file:

		FileInputStream is;
		try {
			is = new FileInputStream(fileName);
			InputStreamReader isr = new InputStreamReader(is, charSet);
			BufferedReader buffReader = new BufferedReader(isr);

			// use own CSVParser to set separator
			final CSVParser parser = new CSVParserBuilder()
					.withSeparator(separator)
					.build();

			// use own CSVReader make use of own CSVParser
			reader = new CSVReaderBuilder(buffReader)
					.withCSVParser(parser)
					.build();


		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}

So I added the code to manipulate the BufferedReader to add an extra semicolon if the column count is 13 instead of 14, but this will result in reader being null.

		FileInputStream is;
		try {
			is = new FileInputStream(fileName);
			InputStreamReader isr = new InputStreamReader(is, charSet);
			BufferedReader buffReader = new BufferedReader(isr);

			buffReader.lines().forEach(t -&gt; {
				String a[] = t.split(&quot;;&quot;);
				int occurence = a.length;

				if(occurence == 13) {
					t = t.concat(&quot;;&quot;);
				}
			});			

			// use own CSVParser to set separator
			final CSVParser parser = new CSVParserBuilder()
					.withSeparator(separator)
					.build();

			// use own CSVReader make use of own CSVParser
			reader = new CSVReaderBuilder(buffReader)
					.withCSVParser(parser)
					.build();


		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}

Does anyone have an idea what I'm doing wrong here?

答案1

得分: 0

以下是翻译好的部分:

  1. First, by the time buffReader is used in new CSVReaderBuilder(buffReader), it has already been fully consumed by buffReader.lines().forEach. A BufferedReader can only be read once, in general. A solution could ordinarily be to create a new InputStreamReader and BufferedReader on the same file, except in this case, you'll run into the second problem.

    首先,在 new CSVReaderBuilder(buffReader) 中使用 buffReader 时,它已经被 buffReader.lines().forEach 完全使用完毕。通常情况下,BufferedReader 只能被读取一次。一种解决方法通常是在同一个文件上创建新的 InputStreamReaderBufferedReader,但在这种情况下,会遇到第二个问题。

  2. The line t = t.concat(&quot;;&quot;); does not work the way you expect. All this does is reassign the local variable t, which isn't used again. It does not change the contents of the file or the contents of the reader.

    t = t.concat(&quot;;&quot;); 这一行的工作方式并不符合你的期望。它只是重新分配本地变量 t,并且不会再次使用。它不会更改文件的内容或读取器的内容。

  3. How to fix this is less straightforward. As far as I know, this exception will only be thrown when binding the CSV data to a bean, and only if fields are marked as required = true. Given that the source data does not always contain data for the last field, it seems like it should not be marked as required.

    如何解决这个问题不太直接。据我所知,只有在将CSV数据绑定到bean时,且字段标记为 required = true 时,才会引发此异常。考虑到源数据并不总是包含最后一个字段的数据,似乎不应将其标记为必需。

  4. If manipulating the source data really is your only option, I can think of a few possible approaches:

    如果操作源数据确实是你唯一的选项,我可以想到一些可能的方法:

    1. Write the modified data back to a temporary file and then read that file with the CSV parser.

      将修改后的数据写回临时文件,然后使用CSV解析器读取该文件。

    2. If the CSV file is small enough to fit into memory, you could write the modified data to a StringWriter, and then construct a StringReader with the result, and parse that.

      如果CSV文件足够小,可以将修改后的数据写入 StringWriter,然后使用结果构造一个 StringReader,并解析它。

    3. Do the file content rewriting and CSV parsing in separate threads, using PipedOutputStream and PipedInputStream to connect them.

      在单独的线程中进行文件内容重写和CSV解析,使用 PipedOutputStreamPipedInputStream 连接它们。

    4. Write a custom implementation of FilterReader that transforms the file contents as they are read (not the most straightforward to implement).

      编写一个自定义的 FilterReader 实现,以在读取时转换文件内容(这并不是最直接的实现方式)。

    这些方法的详细实现细节对于本回答来说过于冗长和广泛,如果需要的话,我建议提出后续问题。

    可能还有一些特定于OpenCSV库的选项,我可能不知道。

英文:

There are a couple of problems here:

First, by the time buffReader is used in new CSVReaderBuilder(buffReader), it has already been fully consumed by buffReader.lines().forEach. A BufferedReader can only be read once, in general. A solution could ordinarily be to create a new InputStreamReader and BufferedReader on the same file, except in this case, you'll run into the second problem.

The line t = t.concat(&quot;;&quot;); does not work the way you expect. All this does is reassign the local variable t, which isn't used again. It does not change the contents of the file or the contents of the reader.

How to fix this is less straightforward. As far as I know, this exception will only be thrown when binding the CSV data to a bean, and only if fields are marked as required = true. Given that the source data does not always contain data for the last field, it seems like it should not be marked as required.

If manipulating the source data really is your only option, I can think of a few possible approaches:

  1. Write the modified data back to a temporary file and then read that file with the CSV parser.
  2. If the CSV file is small enough to fit into memory, you could write the modified data to a StringWriter, and then construct a StringReader with the result, and parse that.
  3. Do the file content rewriting and CSV parsing in separate threads, using PipedOutputStream and PipedInputStream to connect them.
  4. Write a custom implementation of FilterReader that transforms the file contents as they are read (not the most straightforward to implement).

Details of implementing these approaches would too long and broad for this answer, so I would suggest creating follow up questions if needed.

There might be additional options specific to the OpenCSV library that I'm not aware of.

huangapple
  • 本文由 发表于 2023年2月14日 04:50:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/75441052.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定