如何基于唯一列在Java中合并CSV文件。

huangapple go评论133阅读模式
英文:

How to merge csv files based on unique columns in java

问题

我有三个不同的非常大的文件,我的要求是将它们合并成一个单独的CSV文件。这三个文件都有不同数量的列,但它们之间包含一些唯一的列。

**示例**
文件1:

    学生编号     学生姓名      班级编号
    1           Ajay         6
    2           Vinay        8
    3           Geeta        6
    4           Sameer       7

文件2:

    班级编号       班级颜色
    6           蓝色
    7           灰色
    8           白色

文件3:

    房屋编号    房屋名称     学生编号
    1         骑士         4
    2         斯巴达人     1
    3         武士         2
    4         特洛伊人     3

正如您所看到的,文件1和文件2共有班级编号,文件1和文件3共有学生编号。根据这些信息,我想要通过基于共同键的连接来获得以下结果。

    学生编号     学生姓名     班级编号     班级颜色     房屋编号    房屋名称
    1           Ajay        6           蓝色         2          斯巴达人
    2           Vinay       8           白色         3          武士
    3           Geeta       6           蓝色         4          特洛伊人
    4           Sameer      7           灰色         1          骑士

这种情况对我来说是全新的,我进行了一些研究,但在Java中没有找到解决方案。
英文:

I have 3 different very large files and my requirement is to merge them together in one single csv file. All these 3 files have different number of columns but they contain some unique column between each other.

Example
file 1:

StudentId   StudentName   ClassId
1           Ajay           6
2           Vinay          8
3           Geeta          6
4           Sameer         7

file 2:

ClassId       ClassColor
6           Blue
7           Grey
8           White

file 3:

HouseId    HouseName    StudentId
1          Knights       4
2          Spartans      1
3          Samurai       2
4          Trojans       3

As you can see file1 and file2 has classId in common And file1 and file2 has studentId in common.
Now based on this information I want have result like this by joining files based on common key.

StudentId     StudentName    ClassId     ClassColor    HouseId    HouseName
1             Ajay           6           Blue          2          Spartans
2             Vinay          8           White         3          Samurai
3             Geeta          6           Blue          4          Trojans
4             Sameer         7           Grey          1          Knights

This scenario is completely new to me, I did some research but didn't find solution for it in java.

答案1

得分: 0

以下是您提供的代码的中文翻译部分:

用于保存CSV文件数据的类:

import java.util.LinkedHashMap;
import java.util.Map;
public class CsvVo {
    private Map<String, String> keyVal;
    public CsvVo(String id) {
        keyVal = new LinkedHashMap<>();// 如果不需要保持顺序,也可以使用HashMap
    }

    public Map<String, String> getKeyVal() {
        return keyVal;
    }
    public void setKeyVal(Map<String, String> keyVal) {
        this.keyVal = keyVal;
    }
    public void put(String key, String val) {
        keyVal.put(key, val);
    }
    public String get(String key) {
        return keyVal.get(key);
    }
}

用于读取CSV文件的CSV解析器:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Set;
public class CsvParser {
    public static List<CsvVo> getRecodrsFromACsv(File file, List<String> keys) throws IOException {
        BufferedReader br = new BufferedReader(new FileReader(file));
        List<CsvVo> records = new ArrayList<>();
        boolean isHeader = true;
        String line = null;
        while ((line = br.readLine()) != null) {
            if (isHeader) {// 第一行是标题
                isHeader = false;
                continue;
            }
            CsvVo record = new CsvVo(file.getName());
            String[] lineSplit = line.split(",");
            for (int i = 0; i < lineSplit.length; i++) {
                record.put(keys.get(i), lineSplit[i]);
            }
            records.add(record);
        }
        br.close();
        return records;
    }
    public static List<String> getHeadersFromACsv(File file) throws IOException {
        BufferedReader br = new BufferedReader(new FileReader(file));
        List<String> headers = null;
        String line = null;
        while ((line = br.readLine()) != null) {
            String[] lineSplit = line.split(",");
            headers = new ArrayList<>(Arrays.asList(lineSplit));
            break;
        }
        br.close();
        return headers;
    }
    public static void writeToCsv(final File file, final Set<String> headers, final List<CsvVo> records)
            throws IOException {
        FileWriter csvWriter = new FileWriter(file);
        // 写入标题
        String sep = "";
        String[] headersArr = headers.toArray(new String[headers.size()]);
        for (String header : headersArr) {
            csvWriter.append(sep);
            csvWriter.append(header);
            sep = ",";
        }
        csvWriter.append("\n");
        // 在每一行写入记录
        for (CsvVo record : records) {
            sep = "";
            for (int i = 0; i < headersArr.length; i++) {
                csvWriter.append(sep);
                csvWriter.append(record.get(headersArr[i]));
                sep = ",";
            }
            csvWriter.append("\n");
        }
        csvWriter.flush();
        csvWriter.close();
    }
}

用于测试功能的主类:

import java.io.File;
import java.io.IOException;
import java.util.*;

public class CsvApplication {
    public static void main(String[] args) throws IOException {
        File csv1 = new File("/Users/guru/Desktop/Standard.csv");
        File csv2 = new File("/Users/guru/Desktop/Match.csv");
        List<String> csv1Headers = CsvParser.getHeadersFromACsv(csv1);
        List<String> csv2Headers = CsvParser.getHeadersFromACsv(csv2);
        List<String> allCsvHeaders = new ArrayList<>();
        allCsvHeaders.addAll(csv1Headers);
        allCsvHeaders.addAll(csv2Headers);
        Set<String> uniqueHeaders = new HashSet<>(allCsvHeaders);
        List<CsvVo> csv1Records = CsvParser.getRecodrsFromACsv(csv1, csv1Headers);
        List<CsvVo> csv2Records = CsvParser.getRecodrsFromACsv(csv2, csv2Headers);
        List<CsvVo> allCsvRecords = new ArrayList<>();
        String key = "StudentID";
        getUniqueRecordsForKey(key, csv1Records, csv2Records, allCsvRecords);
        CsvParser.writeToCsv(new File("/Users/guru/Desktop/Output.csv"), uniqueHeaders, allCsvRecords);
    }

    private static void getUniqueRecordsForKey(String key, List<CsvVo> csv1Records, List<CsvVo> csv2Records, List<CsvVo> allCsvRecords) {

        for (CsvVo record1 : csv1Records) {
            for (CsvVo record2 : csv2Records) {
                if (!record1.getKeyVal().get(key).isEmpty() && record1.getKeyVal().get(key).equals(record2.getKeyVal().get(key))) {
                    HashMap<String, String> mergedMap = new LinkedHashMap();
                    CsvVo mergedRecord = new CsvVo(record2.getKeyVal().get(key));
                    mergeRecords(mergedMap, record1, record2);
                    mergedRecord.setKeyVal(mergedMap);
                    csv2Records.remove(record2);
                    allCsvRecords.add(mergedRecord);
                    break;
                }
            }
        }
    }

    private static void mergeRecords(HashMap mergedMap, CsvVo record1, CsvVo record2) {
        mergedMap.putAll(record1.getKeyVal());
        mergedMap.putAll(record2.getKeyVal());
    }
}

请注意,这只是代码的翻译部分,不包括问题的回答。如果您有其他需要,请随时提问。

英文:

At this stage I am able to join two files with below implementation:

Class to hold csv file data:

import java.util.LinkedHashMap;
import java.util.Map;
public class CsvVo {
private Map&lt;String, String&gt; keyVal;
public CsvVo(String id) {
keyVal = new LinkedHashMap&lt;&gt;();// you may also use HashMap if you don&#39;t need to keep order
}
public Map&lt;String, String&gt; getKeyVal() {
return keyVal;
}
public void setKeyVal(Map&lt;String, String&gt; keyVal) {
this.keyVal = keyVal;
}
public void put(String key, String val) {
keyVal.put(key, val);
}
public String get(String key) {
return keyVal.get(key);
}
}

CSV parser to read files:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Set;
public class CsvParser {
public static List&lt;CsvVo&gt; getRecodrsFromACsv(File file, List&lt;String&gt; keys) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(file));
List&lt;CsvVo&gt; records = new ArrayList&lt;&gt;();
boolean isHeader = true;
String line = null;
while ((line = br.readLine()) != null) {
if (isHeader) {// first line is header
isHeader = false;
continue;
}
CsvVo record = new CsvVo(file.getName());
String[] lineSplit = line.split(&quot;,&quot;);
for (int i = 0; i &lt; lineSplit.length; i++) {
record.put(keys.get(i), lineSplit[i]);
}
records.add(record);
}
br.close();
return records;
}
public static List&lt;String&gt; getHeadersFromACsv(File file) throws IOException {
BufferedReader br = new BufferedReader(new FileReader(file));
List&lt;String&gt; headers = null;
String line = null;
while ((line = br.readLine()) != null) {
String[] lineSplit = line.split(&quot;,&quot;);
headers = new ArrayList&lt;&gt;(Arrays.asList(lineSplit));
break;
}
br.close();
return headers;
}
public static void writeToCsv(final File file, final Set&lt;String&gt; headers, final List&lt;CsvVo&gt; records)
throws IOException {
FileWriter csvWriter = new FileWriter(file);
// write headers
String sep = &quot;&quot;;
String[] headersArr = headers.toArray(new String[headers.size()]);
for (String header : headersArr) {
csvWriter.append(sep);
csvWriter.append(header);
sep = &quot;,&quot;;
}
csvWriter.append(&quot;\n&quot;);
// write records at each line
for (CsvVo record : records) {
sep = &quot;&quot;;
for (int i = 0; i &lt; headersArr.length; i++) {
csvWriter.append(sep);
csvWriter.append(record.get(headersArr[i]));
sep = &quot;,&quot;;
}
csvWriter.append(&quot;\n&quot;);
}
csvWriter.flush();
csvWriter.close();
}
}

Main Class to test the functionality:

import java.io.File;
import java.io.IOException;
import java.util.*;
public class CsvApplication {
public static void main(String[] args) throws IOException {
File csv1 = new File(&quot;/Users/guru/Desktop/Standard.csv&quot;);
File csv2 = new File(&quot;/Users/guru/Desktop/Match.csv&quot;);
List&lt;String&gt; csv1Headers = CsvParser.getHeadersFromACsv(csv1);
//csv1Headers.forEach(h -&gt; System.out.print(h + &quot; &quot;));
//System.out.println();
List&lt;String&gt; csv2Headers = CsvParser.getHeadersFromACsv(csv2);
//csv2Headers.forEach(h -&gt; System.out.print(h + &quot; &quot;));
//System.out.println();
List&lt;String&gt; allCsvHeaders = new ArrayList&lt;&gt;();
allCsvHeaders.addAll(csv1Headers);
allCsvHeaders.addAll(csv2Headers);
//allCsvHeaders.forEach(h -&gt; System.out.print(h + &quot; &quot;));
//System.out.println();
Set&lt;String&gt; uniqueHeaders = new HashSet&lt;&gt;(allCsvHeaders);
//uniqueHeaders.forEach(h -&gt; System.out.print(h + &quot; &quot;));
//System.out.println();
List&lt;CsvVo&gt; csv1Records = CsvParser.getRecodrsFromACsv(csv1, csv1Headers);
List&lt;CsvVo&gt; csv2Records = CsvParser.getRecodrsFromACsv(csv2, csv2Headers);
List&lt;CsvVo&gt; allCsvRecords = new ArrayList&lt;&gt;();
String key = &quot;StudentID&quot;;
getUniqueRecordsForKey(key, csv1Records, csv2Records,allCsvRecords);
// allCsvRecords.addAll(csv1Records);
//allCsvRecords.addAll(csv2Records);
CsvParser.writeToCsv(new File(&quot;/Users/guru/Desktop/Output.csv&quot;), uniqueHeaders, allCsvRecords);
}
private static void getUniqueRecordsForKey(String key, List&lt;CsvVo&gt; csv1Records, List&lt;CsvVo&gt; csv2Records, List&lt;CsvVo&gt; allCsvRecords) {
for (CsvVo record1: csv1Records){
for(CsvVo record2: csv2Records){
if( !record1.getKeyVal().get(key).isEmpty() &amp;&amp; record1.getKeyVal().get(key).equals(record2.getKeyVal().get(key))){
HashMap&lt;String ,String&gt; mergedMap = new LinkedHashMap();
CsvVo mergedRecord = new CsvVo(record2.getKeyVal().get(key));
mergeRecords(mergedMap, record1,record2);
mergedRecord.setKeyVal(mergedMap);
csv2Records.remove(record2);
allCsvRecords.add(mergedRecord);
break;
}
}
}
}
private static void mergeRecords(HashMap mergedMap,CsvVo record1, CsvVo record2 ){
mergedMap.putAll(record1.getKeyVal());
mergedMap.putAll(record2.getKeyVal());
}
}

Here we are loading entire data of files to lists and then comparing data based on the column we want to join. In this case i have taken StudentID.

Not sure if this is good approach for big files.

huangapple
  • 本文由 发表于 2020年10月8日 15:13:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/64257495.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定