使用Jackson将CSV转换为JSON – 如何移除嵌入在CSV列标题中的换行符

huangapple go评论78阅读模式
英文:

Using Jackson to convert CSV to JSON - How to remove newlines embedded in CSV column header

问题

经过一些快速的谷歌搜索,我找到了使用Jackson库读取和解析CSV文件为JSON的简单方法。一切都很好,除了...一些CSV标题列名称中嵌入了换行符。程序处理了这个问题,但是我得到的是带有嵌入式换行符的JSON键。我想要去掉这些换行符(或者用空格替换它们)。

以下是我找到的简单程序:

import java.io.File;
import java.util.List;
import java.util.Map;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;

public class CSVToJSON {

  public static void main(String[] args) throws Exception {
    File input = new File("PDM_BOM.csv");
    File output = new File("output.json");

    CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();
    CsvMapper csvMapper = new CsvMapper();

    // 从CSV文件中读取数据
    List<Object> readAll = csvMapper.readerFor(Map.class).with(csvSchema).readValues(input)
        .readAll();

    ObjectMapper mapper = new ObjectMapper();

    // 将JSON格式的数据写入output.json文件
    mapper.writerWithDefaultPrettyPrinter().writeValue(output, readAll);

    // 将JSON格式的数据写入标准输出
    System.out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll));
  }
}

所以,举个例子:

PARENT\nITEM\nNUMBER

以下是生成的示例:

"PARENT\nITEM\nNUMBER" : "208E8840040",

我需要它变成:

"PARENT ITEM NUMBER" : "208E8840040",

在Jackson映射器上是否有配置设置可以处理这个问题?或者,我是否需要为映射器提供某种自定义“处理程序”?

特殊情况

为了增加一些复杂性,有些情况下,仅仅将换行符替换为空格并不总是得到所需的结果。

示例1:

有时候有一个像这样的列标题:

QTY\nORDER/\nTRANSACTION

在这种情况下,我需要去掉换行符并替换为空,以便结果是:

QTY ORDER/TRANSACTION
,而不是
QTY ORDER/ TRANSACTION

示例2:

有时候,出于某种原因,列标题在换行符前面有一个空格:

EFFECTIVE \nTHRU DATE

这需要变成:

EFFECTIVE THRU DATE
,而不是
EFFECTIVE THRU DATE

关于如何处理至少主要问题的任何想法都将非常受欢迎。

英文:

After some quick Googling, I found an easy way to read and parse a CSV file to JSON using the Jackson library. All well and good, except ... some of the CSV header column names have embedded newlines. The program handles it, but I'm left with JSON keys with newlines embedded within. I'd like to remove these (or replace them with a space).

Here is the simple program I found:

import java.io.File;
import java.util.List;
import java.util.Map;

import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.dataformat.csv.CsvMapper;
import com.fasterxml.jackson.dataformat.csv.CsvSchema;

public class CSVToJSON {

  public static void main(String[] args) throws Exception {
    File input = new File(&quot;PDM_BOM.csv&quot;);
    File output = new File(&quot;output.json&quot;);

    CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();
    CsvMapper csvMapper = new CsvMapper();

    // Read data from CSV file
    List&lt;Object&gt; readAll = csvMapper.readerFor(Map.class).with(csvSchema).readValues(input)
        .readAll();

    ObjectMapper mapper = new ObjectMapper();

    // Write JSON formated data to output.json file
    mapper.writerWithDefaultPrettyPrinter().writeValue(output, readAll);

    // Write JSON formated data to stdout
    System.out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll));
  }
}

So, as an example:

PARENT\nITEM\nNUMBER

Here's an example of what is produced:

&quot;PARENT\nITEM\nNUMBER&quot; : &quot;208E8840040&quot;,

I need this to be:

&quot;PARENT ITEM NUMBER&quot; : &quot;208E8840040&quot;,

Is there a configuration setting on the Jackson mapper that can handle this? Or, do I need to provide some sort of custom "handler" to the mapper?

Special cases

To add some complexity, there are cases where just replacing the newline with a space will not always yield what is needed.

Example 1:

Sometimes there is a column header like this:

QTY\nORDER/\nTRANSACTION

In this case, I need the newline removed and replaced with nothing, so that the result is:

QTY ORDER/TRANSACTION
, not
QTY ORDER/ TRANSACTION

Example 2:

Sometimes, for whatever reason, a column header has a space before the newline:

EFFECTIVE \nTHRU DATE

This needs to come out as:

EFFECTIVE THRU DATE
, not
EFFECTIVE THRU DATE

Any ideas on how to handle at least the main issue would be very much appreciated.

答案1

得分: 1

你可以使用String的replaceAll()方法将所有的换行符替换为空格。

String str = mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll);
str = str.trim().replaceAll("[\\n\\s]+", " ");
英文:

You can use the String replaceAll() method to replace all new lines with spaces.

String str = mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll);
str = str.trim().replaceAll(&quot;[\n\s]+&quot;, &quot; &quot;);

答案2

得分: 0

好的,以下是翻译好的部分:

好的,我想出了一个解决方案。虽然不太美观,但能够工作。基本上,在 `CsvMapper` 完成后,我会遍历生成的庞大且不太美观的集合,然后使用 `String.replaceAll`(感谢 https://stackoverflow.com/users/4402505/prem-kurian-philip 提供的建议)来移除不需要的字符,然后重新构建映射。

无论如何,以下是新代码:

public class CSVToJSON {

  public static void main(String[] args) throws Exception {
    File input = new File("PDM_BOM.csv");
    File output = new File("output.json");

    CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();
    CsvMapper csvMapper = new CsvMapper();

    // 从 CSV 文件读取数据
    List<Object> readData = csvMapper.readerFor(Map.class).with(csvSchema).readValues(input)
        .readAll();

    for (Object mapObj : readData) {
      LinkedHashMap<String, String> map = (LinkedHashMap<String, String>) mapObj;
      List<String> deleteList = new ArrayList<>();
      LinkedHashMap<String, String> insertMap = new LinkedHashMap<>();

      for (Object entObj : map.entrySet()) {
        Entry<String, String> entry = (Entry<String, String>) entObj;
        String oldKey = entry.getKey();
        String newKey = oldKey.replaceAll("[\\n\\s]+", " ");
        String value = entry.getValue();

        deleteList.add(oldKey);
        insertMap.put(newKey, value);
      }

      // 删除旧的...
      for (String oldKey : deleteList) {
        map.remove(oldKey);
      }

      // 然后引入新的
      map.putAll(insertMap);
    }

    ObjectMapper mapper = new ObjectMapper();

    // 将 JSON 格式的数据写入 output.json 文件
    mapper.writerWithDefaultPrettyPrinter().writeValue(output, readData);

    // 将 JSON 格式的数据写入标准输出
    System.out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readData));
  }
}

似乎应该有更好的方法来实现这一点。

英文:

OK, came up with a solution. It's ugly, but it works. Basically, after the CsvMapper finishes, I go through the giant ugly collection that's produced and do a String.replaceAll (thanks to https://stackoverflow.com/users/4402505/prem-kurian-philip for that suggestion) to remove the unwanted characters and then rebuild the map.

In any case here's the new code:

public class CSVToJSON {

  public static void main(String[] args) throws Exception {
    File input = new File(&quot;PDM_BOM.csv&quot;);
    File output = new File(&quot;output.json&quot;);

    CsvSchema csvSchema = CsvSchema.builder().setUseHeader(true).build();
    CsvMapper csvMapper = new CsvMapper();

    // Read data from CSV file
    List&lt;Object&gt; readData = csvMapper.readerFor(Map.class).with(csvSchema).readValues(input)
        .readAll();

    for (Object mapObj : readData) {
      LinkedHashMap&lt;String, String&gt; map = (LinkedHashMap&lt;String, String&gt;) mapObj;
      List&lt;String&gt; deleteList = new ArrayList&lt;&gt;();
      LinkedHashMap&lt;String, String&gt; insertMap = new LinkedHashMap&lt;&gt;();

      for (Object entObj : map.entrySet()) {
        Entry&lt;String, String&gt; entry = (Entry&lt;String, String&gt;) entObj;
        String oldKey = entry.getKey();
        String newKey = oldKey.replaceAll(&quot;[\n\s]+&quot;, &quot; &quot;);
        String value = entry.getValue();

        deleteList.add(oldKey);
        insertMap.put(newKey, value);
      }

      // Delete the old ...
      for (String oldKey : deleteList) {
        map.remove(oldKey);
      }

      // and bring in the new
      map.putAll(insertMap);
    }

    ObjectMapper mapper = new ObjectMapper();

    // Write JSON formated data to output.json file
    mapper.writerWithDefaultPrettyPrinter().writeValue(output, readData);

    // Write JSON formated data to stdout
    System.out.println(mapper.writerWithDefaultPrettyPrinter().writeValueAsString(readAll));
  }
}

It seems like there should be a better way to achieve this.

huangapple
  • 本文由 发表于 2020年9月11日 01:38:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/63834992.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定