2020年8月11日 02:33:55go评论108阅读模式

英文:

I need to take a CSV file and split it into separate files based on Column Header [JAVA]

问题

我对Java还比较陌生，正在努力学习如何读取、排序和导出CSV文件。我的CSV文件包含以下列标题：[X, Y, Z, Scalar 1, Scalar 2, Scalar 3, Scalar 4]，我需要将它们分成4个独立的CSV文件。实际文件非常长，以下是一个简短的示例：

[X,Y,Z, Sc1, Sc2, Sc3, Sc4]
[1,0,0, 5, 7, 9, 10]
[0,1,1, 6, 8, 4, 0]
[0,0,1, 3, 3, 8, 2]

我需要将源CSV文件拆分为4个单独的CSV文件，每个文件包含一个标量值和X、Y、Z数据。

文件1 | 文件2 | 文件3 | 文件4

[Sc1, X,Y,Z] | [Sc2, X,Y,Z] | [Sc3, X,Y,Z] | [Sc4, X,Y,Z]
[5, 1,0,0] | [7, 1,0,0] | [9, 1,0,0] | [10, 1,0,0]
[6, 0,1,1] | [8, 0,1,1] | [4, 0,1,1] | [ 0, 0,1,1]
[3, 0,0,1] | [3, 0,0,1] | [8, 0,0,1] | [ 2, 0,0,1]

我目前使用BufferedReader读取数据，但不确定一旦读取数据后如何组织数据，或者这是否是一个好方法。

以下是我的代码：

ArrayList<String> readFileFast(String expDir, String filename) {
    String path = expDir + filename;
    ArrayList<String> fileContents = new ArrayList<>();
    try {
        BufferedReader br = new BufferedReader(new FileReader(path));
        String line;
        while ((line = br.readLine()) != null) {
            fileContents.add(line);
        }
    } catch (Exception e) {
        SuperStackPrint(e);
    }
    return fileContents;
}
println(readFileFast(expDir, "/DELETEME.csv"));

希望能提供如何正确处理这个问题的见解。

英文:

I am fairly new to Java and am struggling to read > sort > export a csv. I have a csv with [X, Y, Z, Scalar 1, Scalar 2, Scalar 3, Scalar 4] as headers that need to be separated into 4 csv's. The actual file is thousands of lines long so short example:

[X,Y,Z, Sc1, Sc2, Sc3, Sc4]
[1,0,0,   5,   7,   9,  10]
[0,1,1,   6,   8,   4,   0]
[0,0,1,   3,   3,   8,   2]

I need split the source csv into 4 separate csv's with one scalar value and the x,y,z data.

File 1       | File 2       | File 3       | File 4
----------------------------------------------------------
[Sc1, X,Y,Z] | [Sc2, X,Y,Z] | [Sc3, X,Y,Z] | [Sc4,  X,Y,Z]
[5,   1,0,0] | [7,   1,0,0] | [9,   1,0,0] | [10,   1,0,0]
[6,   0,1,1] | [8,   0,1,1] | [4,   0,1,1] | [ 0,   0,1,1]
[3,   0,0,1] | [3,   0,0,1] | [8,   0,0,1] | [ 2,   0,0,1]

I am currently reading the data in with BufferedReader, but I am not sure how to organize the data once its read or if this is even a good approach.

 ArrayList&lt;String&gt; readFileFast (String expDir,String filename) {
      	String path = expDir + filename;
      	ArrayList&lt;String&gt; fileContents = new ArrayList&lt;&gt;();
      	try {
      		BufferedReader br = new BufferedReader(new FileReader(path));
      		String line;
      		while ((line = br.readLine()) != null) {
      			fileContents.add(line);
      		}
      	} catch (Exception e) {
      		SuperStackPrint(e);
      	}
      	return fileContents;
      }
println(readFileFast(expDir, &quot;/DELETEME.csv&quot;));

Any insight on how to do this properly would be appreciated.

答案1

得分: 0

您可以受益于使用专门用于读写CSV文件的库。有几个选择，但在这里我将使用[OpenCSV][1]。

如果您最终不使用此库，它可能至少会为您提供一些关于自己的方法的想法。

另外，在使用库时，我建议使用诸如Maven或Gradle之类的工具来帮助管理它，因为这些工具会自动处理您的“依赖项依赖项” - 例如，OpenCSV库本身需要访问其使用的其他库。

对于Maven，这是我POM文件中的OpenCSV依赖项：

<dependency>
    <groupId>com.opencsv</groupId>
    <artifactId>opencsv</artifactId>
    <version>5.2</version>
</dependency>

方法：

1）创建一个Java类（一个“bean”），用于保存将从源CSV文件加载的数据。在我的示例中，将其命名为SplitBean。

2）使用这个类创建一个对象集合，其中每个对象包含CSV文件的一行数据。

3）遍历此对象集合，将相关部分写入4个输出文件。

您可以选择遵循上述方法，而不使用OpenCSV或类似的库。但是，您将不得不编写更多关于基本CSV操作的自己的代码。在您的情况下，数据不复杂，因此这不会不合理。

无论哪种方式，我建议创建一个表示输入数据行的类，然后在写入输出文件时处理这些对象的列表。这将将过程分为2个不同的步骤，并利用Java对象来简化过程。

这是SplitBean类：

import com.opencsv.bean.CsvBindByName;
        
public class SplitBean {
    @CsvBindByName(column = "X")
    private int x;
    @CsvBindByName(column = "Y")
    private int y;
    @CsvBindByName(column = "Z")
    private int z;
    
    @CsvBindByName(column = "Sc1")
    private int  sc1;
    @CsvBindByName(column = "Sc2")
    private int  sc2;
    @CsvBindByName(column = "Sc3")
    private int  sc3;
    @CsvBindByName(column = "Sc4")
    private int  sc4;
    public static String[] getHeadingsOne() {
        String[] s = { "Sc1", "X", "Y", "Z" };
        return s;
    }
    
    // 其他getHeadingsTwo、getHeadingsThree和getHeadingsFour方法省略
    public String[] getDataOne() {
        String[] i = { String.valueOf(sc1), String.valueOf(x), 
            String.valueOf(y), String.valueOf(z) };
        return i;
    }
    
    // 其他getDataTwo、getDataThree和getDataFour方法省略
    public int getX() {
        return x;
    }
    // 其他getX、getY、getZ、getSc1等getter和setter方法省略
}

这个类使用@CsvBindByName注解将源CSV文件中的列标题名称映射到类本身的字段名称。您不需要这样做，但这是OpenCSV提供的一个方便功能。

该类还包含处理4个不同输出文件的方法（这些文件是输入文件数据的子集）。

现在我们可以编写一个独立的doTheSplit()方法来使用这个类：

import com.opencsv.bean.CsvToBean;
import com.opencsv.bean.CsvToBeanBuilder;
import com.opencsv.bean.HeaderColumnNameMappingStrategy;
import com.opencsv.exceptions.CsvDataTypeMismatchException;
import com.opencsv.exceptions.CsvRequiredFieldEmptyException;
import com.opencsv.CSVWriter;
import java.io.IOException;
import java.io.Reader;
import java.io.FileWriter;
import java.net.URISyntaxException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
public class SplitData {
    public void doTheSplit() throws URISyntaxException, IOException,
            CsvDataTypeMismatchException, CsvRequiredFieldEmptyException {
        HeaderColumnNameMappingStrategy msIn = new HeaderColumnNameMappingStrategy();
        msIn.setType(SplitBean.class);
        Path path = Paths.get("C:/tmp/csvsplit/input.csv");
        List<SplitBean> list;
        // read the data from the input CSV file into our SplitBean list:
        try ( Reader reader = Files.newBufferedReader(path)) {
            CsvToBean<SplitBean> cb = new CsvToBeanBuilder<SplitBean>(reader)
                    .withMappingStrategy(msIn)
                    .build();
            list = cb.parse();
        }
        // set up 4 file writers:
        try ( CSVWriter writer1 = new CSVWriter(new FileWriter("C:/tmp/csvsplit/output1.csv"));
                CSVWriter writer2 = new CSVWriter(new FileWriter("C:/tmp/csvsplit/output2.csv"));
                CSVWriter writer3 = new CSVWriter(new FileWriter("C:/tmp/csvsplit/output3.csv"));
                CSVWriter writer4 = new CSVWriter(new FileWriter("C:/tmp/csvsplit/output4.csv"))) {
            // first write the headers to each file (false = no quotes):
            writer1.writeNext(SplitBean.getHeadingsOne(), false);
            writer2.writeNext(SplitBean.getHeadingsTwo(), false);
            writer3.writeNext(SplitBean.getHeadingsThree(), false);
            writer4.writeNext(SplitBean.getHeadingsFour(), false);
            
            // then write each row of data (false = no quotes):
            for (SplitBean item : list) {
                writer1.writeNext(item.getDataOne(), false);
                writer2.writeNext(item.getDataTwo(), false);
                writer3.writeNext(item.getDataThree(), false);
                writer4.writeNext(item.getDataFour(), false);
            }
        }
    }
}

这段代码的第一部分填充了一个List<SplitBean>列表。每一行输入电子表格数据都有一个SplitBean对象。OpenCSV在幕后大部分工作都为您处理。

然后，代码创建了4个文件编写器，它们使用OpenCSV的CSVWriter对象来帮助处理将我们的数据格式化为有效的CSV行。

使用此代码，我们将列标题写入了4个文件中的每一个。最后，我们遍历SplitBean项目的集合，并将相关的数据子集写入每个文件。

因此，对于如下的CSV输入文件：

X,Y,Z,Sc1,Sc2,Sc3,
<details>
<summary>英文:</summary>
You will benefit from using a library which specializes in reading and writing CSV files. There are a few to choose from, but here I will use [OpenCSV][1].
If you don&#39;t end up using this library, it may at least give you some ideas for your own approach.
Also, when using libraries, I recommend using a tool such as Maven or Gradle to help manage this, as these tools take care of &quot;dependencies of dependencies&quot; for you - for example, where the OpenCSV library itself needs access to other libraries which it uses.
For Maven, here is OpenCSV dependency for my POM file:

<dependency>
<groupId>com.opencsv</groupId>
<artifactId>opencsv</artifactId>
<version>5.2</version>
</dependency>

The approach:
1) Create a Java class (a &quot;bean&quot;) to hold the data that will be loaded from the source CSV file. This will be called `SplitBean` in my example.
2) Create a collection of objects using this class, where ***each object contains the data for one row of the CSV file***.
3) Iterate across this collection of objects, writing the relevant parts to 4 output files.
*You can choose to follow the above approach without using OpenCSV or a similar library. But you will have to write more of your own code relating to basic CSV operations. In your case, the data is not complicated, so that would not be unreasonable.*
*Either way, I recommend creating a class to represent a row of input data, and then processing a list of such objects when writing to your output files. This splits the process into 2 distinct steps, and makes use of Java objects to simplify the process.*
Here is the `SplitBean` class:

import com.opencsv.bean.CsvBindByName;

public class SplitBean {
@CsvBindByName(column = "X")
private int x;

@CsvBindByName(column = &quot;Y&quot;)
private int y;
@CsvBindByName(column = &quot;Z&quot;)
private int z;
@CsvBindByName(column = &quot;Sc1&quot;)
private int  sc1;
@CsvBindByName(column = &quot;Sc2&quot;)
private int  sc2;
@CsvBindByName(column = &quot;Sc3&quot;)
private int  sc3;
@CsvBindByName(column = &quot;Sc4&quot;)
private int  sc4;
public static String[] getHeadingsOne() {
String[] s = { &quot;Sc1&quot;, &quot;X&quot;, &quot;Y&quot;, &quot;Z&quot; };
return s;
}
public static String[] getHeadingsTwo() {
String[] s = { &quot;Sc2&quot;, &quot;X&quot;, &quot;Y&quot;, &quot;Z&quot; };
return s;
}
public static String[] getHeadingsThree() {
String[] s = { &quot;Sc3&quot;, &quot;X&quot;, &quot;Y&quot;, &quot;Z&quot; };
return s;
}
public static String[] getHeadingsFour() {
String[] s = { &quot;Sc4&quot;, &quot;X&quot;, &quot;Y&quot;, &quot;Z&quot; };
return s;
}
public String[] getDataOne() {
String[] i = { String.valueOf(sc1), String.valueOf(x), 
String.valueOf(y), String.valueOf(z) };
return i;
}
public String[] getDataTwo() {
String[] i = { String.valueOf(sc2), String.valueOf(x), 
String.valueOf(y), String.valueOf(z) };
return i;
}
public String[] getDataThree() {
String[] i = { String.valueOf(sc3), String.valueOf(x), 
String.valueOf(y), String.valueOf(z) };
return i;
}
public String[] getDataFour() {
String[] i = { String.valueOf(sc4), String.valueOf(x), 
String.valueOf(y), String.valueOf(z) };
return i;
}
public int getX() {
return x;
}
public void setX(int x) {
this.x = x;
}
public int getY() {
return y;
}
public void setY(int y) {
this.y = y;
}
public int getZ() {
return z;
}
public void setZ(int z) {
this.z = z;
}
public int getSc1() {
return sc1;
}
public void setSc1(int sc1) {
this.sc1 = sc1;
}
public int getSc2() {
return sc2;
}
public void setSc2(int sc2) {
this.sc2 = sc2;
}
public int getSc3() {
return sc3;
}
public void setSc3(int sc3) {
this.sc3 = sc3;
}
public int getSc4() {
return sc4;
}
public void setSc4(int sc4) {
this.sc4 = sc4;
}

}

This class uses `@CsvBindByName` annotations to map from column heading names in the source CSV file to field names in the class itself. You do not need to do things this way, but it&#39;s a convenient feature provided by OpenCSV.
The class also contains methods which handle the 4 different output files (which are subsets of the input file&#39;s data).
Now we can write a separate `doTheSplit()` method, to use this class:

import com.opencsv.bean.CsvToBean;
import com.opencsv.bean.CsvToBeanBuilder;
import com.opencsv.bean.HeaderColumnNameMappingStrategy;
import com.opencsv.exceptions.CsvDataTypeMismatchException;
import com.opencsv.exceptions.CsvRequiredFieldEmptyException;
import com.opencsv.CSVWriter;
import java.io.IOException;
import java.io.Reader;
import java.io.FileWriter;
import java.net.URISyntaxException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;

public class SplitData {

public void doTheSplit() throws URISyntaxException, IOException,
CsvDataTypeMismatchException, CsvRequiredFieldEmptyException {
HeaderColumnNameMappingStrategy msIn = new HeaderColumnNameMappingStrategy();
msIn.setType(SplitBean.class);
Path path = Paths.get(&quot;C:/tmp/csvsplit/input.csv&quot;);
List&lt;SplitBean&gt; list;
// read the data from the input CSV file into our SplitBean list:
try ( Reader reader = Files.newBufferedReader(path)) {
CsvToBean cb = new CsvToBeanBuilder(reader)
.withMappingStrategy(msIn)
.build();
list = cb.parse();
int i = 1;
}
// set up 4 file writers:
try ( CSVWriter writer1 = new CSVWriter(new FileWriter(&quot;C:/tmp/csvsplit/output1.csv&quot;));
CSVWriter writer2 = new CSVWriter(new FileWriter(&quot;C:/tmp/csvsplit/output2.csv&quot;));
CSVWriter writer3 = new CSVWriter(new FileWriter(&quot;C:/tmp/csvsplit/output3.csv&quot;));
CSVWriter writer4 = new CSVWriter(new FileWriter(&quot;C:/tmp/csvsplit/output4.csv&quot;))) {
// first write the headers to each file (false = no quotes):
writer1.writeNext(SplitBean.getHeadingsOne(), false);
writer2.writeNext(SplitBean.getHeadingsTwo(), false);
writer3.writeNext(SplitBean.getHeadingsThree(), false);
writer4.writeNext(SplitBean.getHeadingsFour(), false);
// then write each row of data (false = no quotes):
for (SplitBean item : list) {
writer1.writeNext(item.getDataOne(), false);
writer2.writeNext(item.getDataTwo(), false);
writer3.writeNext(item.getDataThree(), false);
writer4.writeNext(item.getDataFour(), false);
}
}
}

}

The first part of this code populates a `List&lt;SplitBean&gt; list`. There is one splitBean object for each row of data from the input spreadsheet. OpenCSV takes care of most of the work for you, behind the scenes.
Then, the code creates 4 file writers, which use the OpenCSV `CSVWriter` object, to help handle the formatting of our data into valid CSV rows.
With this code, we write column headers into each of the 4 files. Finally we iterate across our collection of `SplitBean` items, and write the relevant data subsets to each file.
So, for a CSV input file such as this:

X,Y,Z,Sc1,Sc2,Sc3,Sc4
1,0,0,5,7,9,10
0,1,1,6,8,4,0
0,0,1,3,3,8,2

We end up with 4 different output files. One example:

Sc1,X,Y,Z
5,1,0,0
6,0,1,1
3,0,0,1

**Additional note**: One big advantage of using the `SplitBean` class in this way is that you have a lot more flexibility if you decide you need to perform more data transformations - for example, filtering out rows of data, or sorting data in different ways.
[1]: http://opencsv.sourceforge.net/
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我需要将一个CSV文件根据列标题拆分成单独的文件 [JAVA]。

问题

文件1 | 文件2 | 文件3 | 文件4

答案1

将相同线性方程的所有行分组。

Eclipse Photon IDE不适用于Java EE开发。

遍历二维数组以达到整数值为0的位置

JavaFX在创建新类时不会注入到@FXML注解的成员。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。