OpenCSV阅读器会剥离结束引号,而不是忽略它们。

huangapple go评论79阅读模式
英文:

OpenCSV reader strips ending quotes instead of ignoring them

问题

有这样一段数据:

id,description,amount
1,Foo "bar",10.5
2,Quick "brown" fox,9.0

我知道正确的 CSV 格式应该把 `Foo "bar"` 转义为 `Foo ""bar""` 以保留引号。但这就是我在处理之前必须要处理的数据 -- 🤷,我不能在处理之前对其进行修改。

```java
try (CSVReader csvReader = new CSVReaderBuilder(new FileReader(resourcePath))
        .withSkipLines(1)
        .withCSVParser(new CSVParserBuilder().withIgnoreQuotations(true).build())
        .build()) {
    String[] line;
    ..

为了解决这个问题,我尝试让 CSVReader 使用 withIgnoreQuotations(true) 来忽略所有引号,但它似乎会去掉最后一个引号,而不是忽略它,因此输出结果变成了

1,Foo "bar,10.5

是否有办法通过 OpenCSV 实现下面的输出呢?

1,Foo "bar",10.5

<details>
<summary>英文:</summary>

Say I have:

id,description,amount
1,Foo "bar",10.5
2,Quick "brown" fox,9.0

I know that proper csv should have `Foo &quot;bar&quot;` escaped as `Foo &quot;&quot;bar&quot;&quot;` to pick up the quotes. But this is the data I have to deal with -- &#129335;, and it is not something I can modify before processing.

try (CSVReader csvReader = new CSVReaderBuilder(new FileReader(resourcePath))
.withSkipLines(1)
.withCSVParser(new CSVParserBuilder().withIgnoreQuotations(true).build())
.build()) {
String[] line;
..

To solve this I try to make CSVReader to ignore all quotes with `withIgnoreQuotations(true)`, but it seems to strip the last quotation instead of ignoring it, so the output ends up with 

1,Foo "bar,10.5

. Is there any way to achieve below with OpenCSV?

1,Foo "bar",10.5



</details>


# 答案1
**得分**: 0

你可以使用类似以下的方式:

new CSVParserBuilder().withQuoteChar('§').build();

显然,这并不是理想的方法,因为你必须选择一个在你的数据中肯定不会出现的字符。我选择了章节符号 `§` - 这可能不适用于你。

仅供参考,Apache Commons CSV 解析器不会展现这种行为:


org.apache.commons
commons-csv
1.8

```
还有:
```
String sampleRecord = "1,Foo \"bar\",10.5";
CSVParser parser = CSVParser.parse(sampleRecord, CSVFormat.DEFAULT);

for (CSVRecord record : parser) {
System.out.println(record.get(1));
}

这会输出:

Foo "bar"


<details>
<summary>英文:</summary>

You can use something like the following:

new CSVParserBuilder().withQuoteChar('§').build();

Obviously this is not ideal, as you have to choose a character which is guaranteed never to appear in your data. I chose the section symbol `&#167;`- that may not work for you.

Just out of interest, the Apache Commons CSV parser does not exhibit this behavior:

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
<version>1.8</version>
</dependency>

And:

String sampleRecord = "1,Foo &quot;bar&quot;,10.5";
CSVParser parser = CSVParser.parse(sampleRecord, CSVFormat.DEFAULT);

for (CSVRecord record : parser) {
System.out.println(record.get(1));
}

This prints:

Foo "bar"


</details>



huangapple
  • 本文由 发表于 2020年8月21日 01:25:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/63510275.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定