合并两个CSV文件的方法是使用共同的列。

huangapple go评论82阅读模式
英文:

Merge 2 csv files by common column

问题

我有2个CSV文件,分别是first.csv和second.csv。它们都有一个共享的列。

示例:
first:a b c d
second:x y a z

我必须创建一个第三个CSV文件,看起来像这样:
third:a b c d x y z

这些文件的条目数量不同,我只能合并具有相同列的行。
此外,这3个CSV文件的路径必须作为参数传递。

我尝试在Java中做这个,但Python也可以!
我真的不知道该怎么做:(

英文:

I have 2 csv files first.csv and second.csv. They both have a shared column.

Example:
first : a b c d
second: x y a z

I have to create a third csv file that looks like this:
third : a b c d x y z

The files do not have the same number of entries, I must only merge the lines that share the same column.
Also the paths of the 3 csv files must be sent as parameters.

I was trying to do this in Java but Python would also work!

I don't really know what I should do 合并两个CSV文件的方法是使用共同的列。

答案1

得分: 1

如果它们始终只有一个共享列并且您想合并具有该列中相同值的记录),那么以下代码可能对您有所帮助

```python
import pandas as pd

def merge_csv_files(first_file_path, second_file_path, output_file_path):
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    shared_column = set(first_df.columns) & set(second_df.columns)
    # 寻找确切的一个共享列
    if len(shared_column) != 1:
        raise ValueError("CSV 文件没有恰好一个共享列。")
    shared_column = shared_column.pop()

    merged_df = pd.merge(first_df, second_df, on=shared_column, how='inner')

    merged_df.to_csv(output_file_path, index=False)

first_file_path = 'first.csv'
second_file_path = 'second.csv'
output_file_path = 'third.csv'

merge_csv_files(first_file_path, second_file_path, output_file_path)
英文:

If they will always have exactly 1 shared column and you want to merge the records (lines) that have the same value in that column, then the following code might help you:

import pandas as pd

def merge_csv_files(first_file_path, second_file_path, output_file_path):
    first_df = pd.read_csv(first_file_path)
    second_df = pd.read_csv(second_file_path)

    shared_column = set(first_df.columns) & set(second_df.columns)
    # look for exactly 1 shared column
    if len(shared_column) != 1:
        raise ValueError("The CSV files do not have exactly one shared column.")
    shared_column = shared_column.pop()

    merged_df = pd.merge(first_df, second_df, on=shared_column, how='inner')

    merged_df.to_csv(output_file_path, index=False)

first_file_path = 'first.csv'
second_file_path = 'second.csv'
output_file_path = 'third.csv'

merge_csv_files(first_file_path, second_file_path, output_file_path)

huangapple
  • 本文由 发表于 2023年7月13日 19:19:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76678781.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定