我需要将一个CSV文件根据列标题拆分成单独的文件 [JAVA]。

huangapple go评论82阅读模式
英文:

I need to take a CSV file and split it into separate files based on Column Header [JAVA]

问题

我对Java还比较陌生,正在努力学习如何读取、排序和导出CSV文件。我的CSV文件包含以下列标题:[X, Y, Z, Scalar 1, Scalar 2, Scalar 3, Scalar 4],我需要将它们分成4个独立的CSV文件。实际文件非常长,以下是一个简短的示例:

[X,Y,Z, Sc1, Sc2, Sc3, Sc4]
[1,0,0, 5, 7, 9, 10]
[0,1,1, 6, 8, 4, 0]
[0,0,1, 3, 3, 8, 2]

我需要将源CSV文件拆分为4个单独的CSV文件,每个文件包含一个标量值和X、Y、Z数据。

文件1 | 文件2 | 文件3 | 文件4

[Sc1, X,Y,Z] | [Sc2, X,Y,Z] | [Sc3, X,Y,Z] | [Sc4, X,Y,Z]
[5, 1,0,0] | [7, 1,0,0] | [9, 1,0,0] | [10, 1,0,0]
[6, 0,1,1] | [8, 0,1,1] | [4, 0,1,1] | [ 0, 0,1,1]
[3, 0,0,1] | [3, 0,0,1] | [8, 0,0,1] | [ 2, 0,0,1]

我目前使用BufferedReader读取数据,但不确定一旦读取数据后如何组织数据,或者这是否是一个好方法。

以下是我的代码:

ArrayList<String> readFileFast(String expDir, String filename) {
    String path = expDir + filename;
    ArrayList<String> fileContents = new ArrayList<>();
    try {
        BufferedReader br = new BufferedReader(new FileReader(path));
        String line;
        while ((line = br.readLine()) != null) {
            fileContents.add(line);
        }
    } catch (Exception e) {
        SuperStackPrint(e);
    }
    return fileContents;
}

println(readFileFast(expDir, "/DELETEME.csv"));

希望能提供如何正确处理这个问题的见解。

英文:

I am fairly new to Java and am struggling to read > sort > export a csv. I have a csv with [X, Y, Z, Scalar 1, Scalar 2, Scalar 3, Scalar 4] as headers that need to be separated into 4 csv's. The actual file is thousands of lines long so short example:

[X,Y,Z, Sc1, Sc2, Sc3, Sc4]
[1,0,0,   5,   7,   9,  10]
[0,1,1,   6,   8,   4,   0]
[0,0,1,   3,   3,   8,   2]

I need split the source csv into 4 separate csv's with one scalar value and the x,y,z data.

File 1       | File 2       | File 3       | File 4
----------------------------------------------------------
[Sc1, X,Y,Z] | [Sc2, X,Y,Z] | [Sc3, X,Y,Z] | [Sc4,  X,Y,Z]
[5,   1,0,0] | [7,   1,0,0] | [9,   1,0,0] | [10,   1,0,0]
[6,   0,1,1] | [8,   0,1,1] | [4,   0,1,1] | [ 0,   0,1,1]
[3,   0,0,1] | [3,   0,0,1] | [8,   0,0,1] | [ 2,   0,0,1]

I am currently reading the data in with BufferedReader, but I am not sure how to organize the data once its read or if this is even a good approach.

 ArrayList&lt;String&gt; readFileFast (String expDir,String filename) {
      	String path = expDir + filename;
      	ArrayList&lt;String&gt; fileContents = new ArrayList&lt;&gt;();
      	try {
      		BufferedReader br = new BufferedReader(new FileReader(path));
      		String line;
      		while ((line = br.readLine()) != null) {
      			fileContents.add(line);
      		}
      	} catch (Exception e) {
      		SuperStackPrint(e);
      	}
      	return fileContents;
      }

println(readFileFast(expDir, &quot;/DELETEME.csv&quot;));

Any insight on how to do this properly would be appreciated.

答案1

得分: 0

您可以受益于使用专门用于读写CSV文件的库。有几个选择,但在这里我将使用[OpenCSV][1]。

如果您最终不使用此库,它可能至少会为您提供一些关于自己的方法的想法。

另外,在使用库时,我建议使用诸如Maven或Gradle之类的工具来帮助管理它,因为这些工具会自动处理您的“依赖项依赖项” - 例如,OpenCSV库本身需要访问其使用的其他库。

对于Maven,这是我POM文件中的OpenCSV依赖项:

<dependency>
    <groupId>com.opencsv</groupId>
    <artifactId>opencsv</artifactId>
    <version>5.2</version>
</dependency>

方法:

1)创建一个Java类(一个“bean”),用于保存将从源CSV文件加载的数据。在我的示例中,将其命名为SplitBean

2)使用这个类创建一个对象集合,其中每个对象包含CSV文件的一行数据

3)遍历此对象集合,将相关部分写入4个输出文件。

您可以选择遵循上述方法,而不使用OpenCSV或类似的库。但是,您将不得不编写更多关于基本CSV操作的自己的代码。在您的情况下,数据不复杂,因此这不会不合理。

无论哪种方式,我建议创建一个表示输入数据行的类,然后在写入输出文件时处理这些对象的列表。这将将过程分为2个不同的步骤,并利用Java对象来简化过程。

这是SplitBean类:

import com.opencsv.bean.CsvBindByName;
        
public class SplitBean {
    @CsvBindByName(column = "X")
    private int x;

    @CsvBindByName(column = "Y")
    private int y;

    @CsvBindByName(column = "Z")
    private int z;
    
    @CsvBindByName(column = "Sc1")
    private int  sc1;

    @CsvBindByName(column = "Sc2")
    private int  sc2;

    @CsvBindByName(column = "Sc3")
    private int  sc3;

    @CsvBindByName(column = "Sc4")
    private int  sc4;

    public static String[] getHeadingsOne() {
        String[] s = { "Sc1", "X", "Y", "Z" };
        return s;
    }
    
    // 其他getHeadingsTwo、getHeadingsThree和getHeadingsFour方法省略

    public String[] getDataOne() {
        String[] i = { String.valueOf(sc1), String.valueOf(x), 
            String.valueOf(y), String.valueOf(z) };
        return i;
    }
    
    // 其他getDataTwo、getDataThree和getDataFour方法省略

    public int getX() {
        return x;
    }

    // 其他getX、getY、getZ、getSc1等getter和setter方法省略
}

这个类使用@CsvBindByName注解将源CSV文件中的列标题名称映射到类本身的字段名称。您不需要这样做,但这是OpenCSV提供的一个方便功能。

该类还包含处理4个不同输出文件的方法(这些文件是输入文件数据的子集)。

现在我们可以编写一个独立的doTheSplit()方法来使用这个类:

import com.opencsv.bean.CsvToBean;
import com.opencsv.bean.CsvToBeanBuilder;
import com.opencsv.bean.HeaderColumnNameMappingStrategy;
import com.opencsv.exceptions.CsvDataTypeMismatchException;
import com.opencsv.exceptions.CsvRequiredFieldEmptyException;
import com.opencsv.CSVWriter;
import java.io.IOException;
import java.io.Reader;
import java.io.FileWriter;
import java.net.URISyntaxException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;

public class SplitData {

    public void doTheSplit() throws URISyntaxException, IOException,
            CsvDataTypeMismatchException, CsvRequiredFieldEmptyException {
        HeaderColumnNameMappingStrategy msIn = new HeaderColumnNameMappingStrategy();
        msIn.setType(SplitBean.class);

        Path path = Paths.get("C:/tmp/csvsplit/input.csv");
        List<SplitBean> list;

        // read the data from the input CSV file into our SplitBean list:
        try ( Reader reader = Files.newBufferedReader(path)) {
            CsvToBean<SplitBean> cb = new CsvToBeanBuilder<SplitBean>(reader)
                    .withMappingStrategy(msIn)
                    .build();
            list = cb.parse();
        }

        // set up 4 file writers:
        try ( CSVWriter writer1 = new CSVWriter(new FileWriter("C:/tmp/csvsplit/output1.csv"));
                CSVWriter writer2 = new CSVWriter(new FileWriter("C:/tmp/csvsplit/output2.csv"));
                CSVWriter writer3 = new CSVWriter(new FileWriter("C:/tmp/csvsplit/output3.csv"));
                CSVWriter writer4 = new CSVWriter(new FileWriter("C:/tmp/csvsplit/output4.csv"))) {

            // first write the headers to each file (false = no quotes):
            writer1.writeNext(SplitBean.getHeadingsOne(), false);
            writer2.writeNext(SplitBean.getHeadingsTwo(), false);
            writer3.writeNext(SplitBean.getHeadingsThree(), false);
            writer4.writeNext(SplitBean.getHeadingsFour(), false);
            
            // then write each row of data (false = no quotes):
            for (SplitBean item : list) {
                writer1.writeNext(item.getDataOne(), false);
                writer2.writeNext(item.getDataTwo(), false);
                writer3.writeNext(item.getDataThree(), false);
                writer4.writeNext(item.getDataFour(), false);
            }
        }
    }
}

这段代码的第一部分填充了一个List<SplitBean>列表。每一行输入电子表格数据都有一个SplitBean对象。OpenCSV在幕后大部分工作都为您处理。

然后,代码创建了4个文件编写器,它们使用OpenCSV的CSVWriter对象来帮助处理将我们的数据格式化为有效的CSV行。

使用此代码,我们将列标题写入了4个文件中的每一个。最后,我们遍历SplitBean项目的集合,并将相关的数据子集写入每个文件。

因此,对于如下的CSV输入文件:

X,Y,Z,Sc1,Sc2,Sc3,
<details>
<summary>英文:</summary>
You will benefit from using a library which specializes in reading and writing CSV files. There are a few to choose from, but here I will use [OpenCSV][1].
If you don&#39;t end up using this library, it may at least give you some ideas for your own approach.
Also, when using libraries, I recommend using a tool such as Maven or Gradle to help manage this, as these tools take care of &quot;dependencies of dependencies&quot; for you - for example, where the OpenCSV library itself needs access to other libraries which it uses.
For Maven, here is OpenCSV dependency for my POM file:

<dependency>
<groupId>com.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>5.2</version>
</dependency>

The approach:
1) Create a Java class (a &quot;bean&quot;) to hold the data that will be loaded from the source CSV file. This will be called `SplitBean` in my example.
2) Create a collection of objects using this class, where ***each object contains the data for one row of the CSV file***.
3) Iterate across this collection of objects, writing the relevant parts to 4 output files.
*You can choose to follow the above approach without using OpenCSV or a similar library. But you will have to write more of your own code relating to basic CSV operations. In your case, the data is not complicated, so that would not be unreasonable.*
*Either way, I recommend creating a class to represent a row of input data, and then processing a list of such objects when writing to your output files. This splits the process into 2 distinct steps, and makes use of Java objects to simplify the process.*
Here is the `SplitBean` class:

import com.opencsv.bean.CsvBindByName;

public class SplitBean {
@CsvBindByName(column = "X")
private int x;

@CsvBindByName(column = &quot;Y&quot;)
private int y;
@CsvBindByName(column = &quot;Z&quot;)
private int z;
@CsvBindByName(column = &quot;Sc1&quot;)
private int  sc1;
@CsvBindByName(column = &quot;Sc2&quot;)
private int  sc2;
@CsvBindByName(column = &quot;Sc3&quot;)
private int  sc3;
@CsvBindByName(column = &quot;Sc4&quot;)
private int  sc4;
public static String[] getHeadingsOne() {
String[] s = { &quot;Sc1&quot;, &quot;X&quot;, &quot;Y&quot;, &quot;Z&quot; };
return s;
}
public static String[] getHeadingsTwo() {
String[] s = { &quot;Sc2&quot;, &quot;X&quot;, &quot;Y&quot;, &quot;Z&quot; };
return s;
}
public static String[] getHeadingsThree() {
String[] s = { &quot;Sc3&quot;, &quot;X&quot;, &quot;Y&quot;, &quot;Z&quot; };
return s;
}
public static String[] getHeadingsFour() {
String[] s = { &quot;Sc4&quot;, &quot;X&quot;, &quot;Y&quot;, &quot;Z&quot; };
return s;
}
public String[] getDataOne() {
String[] i = { String.valueOf(sc1), String.valueOf(x), 
String.valueOf(y), String.valueOf(z) };
return i;
}
public String[] getDataTwo() {
String[] i = { String.valueOf(sc2), String.valueOf(x), 
String.valueOf(y), String.valueOf(z) };
return i;
}
public String[] getDataThree() {
String[] i = { String.valueOf(sc3), String.valueOf(x), 
String.valueOf(y), String.valueOf(z) };
return i;
}
public String[] getDataFour() {
String[] i = { String.valueOf(sc4), String.valueOf(x), 
String.valueOf(y), String.valueOf(z) };
return i;
}
public int getX() {
return x;
}
public void setX(int x) {
this.x = x;
}
public int getY() {
return y;
}
public void setY(int y) {
this.y = y;
}
public int getZ() {
return z;
}
public void setZ(int z) {
this.z = z;
}
public int getSc1() {
return sc1;
}
public void setSc1(int sc1) {
this.sc1 = sc1;
}
public int getSc2() {
return sc2;
}
public void setSc2(int sc2) {
this.sc2 = sc2;
}
public int getSc3() {
return sc3;
}
public void setSc3(int sc3) {
this.sc3 = sc3;
}
public int getSc4() {
return sc4;
}
public void setSc4(int sc4) {
this.sc4 = sc4;
}

}

This class uses `@CsvBindByName` annotations to map from column heading names in the source CSV file to field names in the class itself. You do not need to do things this way, but it&#39;s a convenient feature provided by OpenCSV.
The class also contains methods which handle the 4 different output files (which are subsets of the input file&#39;s data).
Now we can write a separate `doTheSplit()` method, to use this class:

import com.opencsv.bean.CsvToBean;
import com.opencsv.bean.CsvToBeanBuilder;
import com.opencsv.bean.HeaderColumnNameMappingStrategy;
import com.opencsv.exceptions.CsvDataTypeMismatchException;
import com.opencsv.exceptions.CsvRequiredFieldEmptyException;
import com.opencsv.CSVWriter;
import java.io.IOException;
import java.io.Reader;
import java.io.FileWriter;
import java.net.URISyntaxException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;

public class SplitData {

public void doTheSplit() throws URISyntaxException, IOException,
CsvDataTypeMismatchException, CsvRequiredFieldEmptyException {
HeaderColumnNameMappingStrategy msIn = new HeaderColumnNameMappingStrategy();
msIn.setType(SplitBean.class);
Path path = Paths.get(&quot;C:/tmp/csvsplit/input.csv&quot;);
List&lt;SplitBean&gt; list;
// read the data from the input CSV file into our SplitBean list:
try ( Reader reader = Files.newBufferedReader(path)) {
CsvToBean cb = new CsvToBeanBuilder(reader)
.withMappingStrategy(msIn)
.build();
list = cb.parse();
int i = 1;
}
// set up 4 file writers:
try ( CSVWriter writer1 = new CSVWriter(new FileWriter(&quot;C:/tmp/csvsplit/output1.csv&quot;));
CSVWriter writer2 = new CSVWriter(new FileWriter(&quot;C:/tmp/csvsplit/output2.csv&quot;));
CSVWriter writer3 = new CSVWriter(new FileWriter(&quot;C:/tmp/csvsplit/output3.csv&quot;));
CSVWriter writer4 = new CSVWriter(new FileWriter(&quot;C:/tmp/csvsplit/output4.csv&quot;))) {
// first write the headers to each file (false = no quotes):
writer1.writeNext(SplitBean.getHeadingsOne(), false);
writer2.writeNext(SplitBean.getHeadingsTwo(), false);
writer3.writeNext(SplitBean.getHeadingsThree(), false);
writer4.writeNext(SplitBean.getHeadingsFour(), false);
// then write each row of data (false = no quotes):
for (SplitBean item : list) {
writer1.writeNext(item.getDataOne(), false);
writer2.writeNext(item.getDataTwo(), false);
writer3.writeNext(item.getDataThree(), false);
writer4.writeNext(item.getDataFour(), false);
}
}
}

}

The first part of this code populates a `List&lt;SplitBean&gt; list`. There is one splitBean object for each row of data from the input spreadsheet. OpenCSV takes care of most of the work for you, behind the scenes.
Then, the code creates 4 file writers, which use the OpenCSV `CSVWriter` object, to help handle the formatting of our data into valid CSV rows.
With this code, we write column headers into each of the 4 files. Finally we iterate across our collection of `SplitBean` items, and write the relevant data subsets to each file.
So, for a CSV input file such as this:

X,Y,Z,Sc1,Sc2,Sc3,Sc4
1,0,0,5,7,9,10
0,1,1,6,8,4,0
0,0,1,3,3,8,2

We end up with 4 different output files. One example:

Sc1,X,Y,Z
5,1,0,0
6,0,1,1
3,0,0,1

**Additional note**: One big advantage of using the `SplitBean` class in this way is that you have a lot more flexibility if you decide you need to perform more data transformations - for example, filtering out rows of data, or sorting data in different ways.
[1]: http://opencsv.sourceforge.net/
</details>

huangapple
  • 本文由 发表于 2020年8月11日 02:33:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/63346048.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定