2023年5月25日 05:32:55go评论105阅读模式

英文:

Apache CSV Commons: IOException: (line 5) invalid char between encapsulated token and delimiter

问题

Apache CSV Commons库遇到异常中断：

IOException：(第5行) 封装令牌与分隔符之间的无效字符

它尝试读取以下 .csv 文件：

"id", "category_id", "brand_id", "catalog_number"
"6427146", "4045", "764\&quot;13", "A26-30-01"
"6425052", "4058", "764\&quot;13", "P9B02VN"
"6424406", "4054", "764\&quot;13", "A40-30-10-80"
"6152302", "4046", "764\&quot;13", "1.75\\&quot; center distance"
"6152301", "4046", "764\&quot;13", "ZL110"
"6152300", "4046", "764\&quot;13", "ZAF460-70"
"6152299", "4046", "764\&quot;13", "ZA75-84"
"6152298", "4046", "764\&quot;13", "ZA75-80"
"6152297", "4046", "764\&quot;13", "ZA75-55-1SBN153510R5506"

库无法读取第5行。

代码：

@Test
public void testReadCsvFile() throws IOException {
    Reader reader = new FileReader("products-with-escaped-escape-symbol.csv");

    Iterable<CSVRecord> records = CSVFormat.DEFAULT
            .withHeader(HEADERS)
            .withFirstRecordAsHeader()
            .withIgnoreHeaderCase()
            .withIgnoreSurroundingSpaces()
            .withEscape('\\')
            .parse(reader);

    for (CSVRecord record : records) {
        String brandId = record.get("brand_id");
        assertThat(brandId, is("764\&quot;13"));
    }
}

有没有办法调整Apache CSV Commons库以读取具有这种结构的 .csv 文件？

英文:

Apache CSV Commons library interrupted with exception:

IOException: (line 5) invalid char between encapsulated token and delimiter

It tries to read following .csv file:

&quot;id&quot;, &quot;category_id&quot;, &quot;brand_id&quot;, &quot;catalog_number&quot;
&quot;6427146&quot;, &quot;4045&quot;, &quot;764\&quot;13&quot;, &quot;A26-30-01&quot;
&quot;6425052&quot;, &quot;4058&quot;, &quot;764\&quot;13&quot;, &quot;P9B02VN&quot;
&quot;6424406&quot;, &quot;4054&quot;, &quot;764\&quot;13&quot;, &quot;A40-30-10-80&quot;
&quot;6152302&quot;, &quot;4046&quot;, &quot;764\&quot;13&quot;, &quot;1.75\\&quot; center distance&quot;
&quot;6152301&quot;, &quot;4046&quot;, &quot;764\&quot;13&quot;, &quot;ZL110&quot;
&quot;6152300&quot;, &quot;4046&quot;, &quot;764\&quot;13&quot;, &quot;ZAF460-70&quot;
&quot;6152299&quot;, &quot;4046&quot;, &quot;764\&quot;13&quot;, &quot;ZA75-84&quot;
&quot;6152298&quot;, &quot;4046&quot;, &quot;764\&quot;13&quot;, &quot;ZA75-80&quot;
&quot;6152297&quot;, &quot;4046&quot;, &quot;764\&quot;13&quot;, &quot;ZA75-55-1SBN153510R5506&quot;

Library cannot read line #5.

Code:

@Test
public void testReadCsvFile() throws IOException {
    Reader reader = new FileReader(&quot;products-with-escaped-escape-symbol.csv&quot;);

    Iterable&lt;CSVRecord&gt; records = CSVFormat.DEFAULT
            .withHeader(HEADERS)
            .withFirstRecordAsHeader()
            .withIgnoreHeaderCase()
            .withIgnoreSurroundingSpaces()
            .withEscape(&#39;\\&#39;)
            .parse(reader);

    for (CSVRecord record : records) {
        String brandId = record.get(&quot;brand_id&quot;);
        assertThat(brandId, is(&quot;764\&quot;13&quot;));
    }
}

Is there a way to adjust Apache CSV Commons library to read a .csv file with such structure?

答案1

得分: 1

以下是翻译好的部分：

"The best thing you can do is let the source of this data know that they have provided you with invalid input. They are supposed to send a CSV file and they did fail to do that. This input is not a CSV file, it merely somewhat resembles a CSV file.

If that option is not available, you can create your own Reader which filters each line before passing it to the CSVParser:

List records;

CSVFormat format = CSVFormat.DEFAULT
.withHeader(HEADERS)
.withFirstRecordAsHeader()
.withIgnoreHeaderCase()
.withIgnoreSurroundingSpaces()
.withEscape('\');

try (PipedReader reader = new PipedReader();
PipedWriter writer = new PipedWriter(reader)) {

Runnable filterTask = () -> {
    try (BufferedReader fileReader = Files.newBufferedReader(
            Path.of("products-with-escaped-escape-symbol.csv"));
         PipedWriter filteredWriter = writer) {

        String line;
        while ((line = fileReader.readLine()) != null) {
            line = line.replaceAll(
                "^(" +
                "(?:\\s*\"(?:[^\"\\\\]|\\\\.)*\"\\s*,)*" +
                "\\s*\"(?:[^\"\\\\]|\\\\[^\\\\])*\"" +
                ")" +
                "\\\\\\\\\" +
                "($|\\s*[^,])", "$1\\\\\"$2");
            filteredWriter.write(line);
            filteredWriter.write('\n');
        }
    } catch (IOException e) {
        throw new UncheckedIOException(e);
    }
};

CompletableFuture<?> filter = CompletableFuture.runAsync(filterTask);

try (CSVParser parser = format.parse(reader)) {
    records = parser.getRecords();
    filter.get();
} catch (ExecutionException e) {
    Throwable cause = e.getCause();
    if (cause instanceof RuntimeException re) {
        throw re;
    } else {
        throw new RuntimeException(cause);
    }
} catch (InterruptedException e) {
    throw new RuntimeException(e);
}

} catch (IOException e) {
throw new UncheckedIOException(e);
}

This is not a completely reliable solution. If a comma followed '\"' and that comma was intended to be part of the value, rather than a value separator, writing code to recognize that case would be a lot more complicated."

英文:

The best thing you can do is let the source of this data know that they have povided you with invalid input. They are supposed to send a CSV file and they did failed to do that. This input is not a CSV file, it merely somewhat resembles a CSV file.

If that option is not available, you can create your own Reader which filters each line before passing it to the CSVParser:

List&lt;CSVRecord&gt; records;

CSVFormat format = CSVFormat.DEFAULT
        .withHeader(HEADERS)
        .withFirstRecordAsHeader()
        .withIgnoreHeaderCase()
        .withIgnoreSurroundingSpaces()
        .withEscape(&#39;\\&#39;);

try (PipedReader reader = new PipedReader();
     PipedWriter writer = new PipedWriter(reader)) {

    Runnable filterTask = () -&gt; {
        try (BufferedReader fileReader = Files.newBufferedReader(
                Path.of(&quot;products-with-escaped-escape-symbol.csv&quot;));
             PipedWriter filteredWriter = writer) {

            String line;
            while ((line = fileReader.readLine()) != null) {
                line = line.replaceAll(
                    &quot;^(&quot; +
                    &quot;(?:\\s*\&quot;(?:[^\&quot;\\\\]|\\\\.)*\&quot;\\s*,)*&quot; +
                    &quot;\\s*\&quot;(?:[^\&quot;\\\\]|\\\\[^\\\\])*&quot; +
                    &quot;)&quot; +
                    &quot;\\\\\\\\\&quot;($|\\s*[^,])&quot;, &quot;$1\\\\\&quot;$2&quot;);
                filteredWriter.write(line);
                filteredWriter.write(&#39;\n&#39;);
            }
        } catch (IOException e) {
            throw new UncheckedIOException(e);
        }
    };

    CompletableFuture&lt;?&gt; filter = CompletableFuture.runAsync(filterTask);

    try (CSVParser parser = format.parse(reader)) {
        records = parser.getRecords();
        filter.get();
    } catch (ExecutionException e) {
        Throwable cause = e.getCause();
        if (cause instanceof RuntimeException re) {
            throw re;
        } else {
            throw new RuntimeException(cause);
        }
    } catch (InterruptedException e) {
        throw new RuntimeException(e);
    }
} catch (IOException e) {
    throw new UncheckedIOException(e);
}

This is not a completely reliable solution. If a comma followed \\" and that comma was intended to be part of the value, rather than a value separator, writing code to recognize that case would be a lot more complicated.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Apache CSV Commons: IOException: (line 5) invalid char between encapsulated token and delimiter

问题

答案1

在包外部将一个类文件从包中扩展，是否可能？

更改MaryTTS使用时的音频速度

自定义Quarkus开发服务

时区方法不会更改当前的即时时间。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论