英文:
Merge 2 csv files by common column
问题
我有2个CSV文件,分别是first.csv和second.csv。它们都有一个共享的列。
示例:
first:a b c d
second:x y a z
我必须创建一个第三个CSV文件,看起来像这样:
third:a b c d x y z
这些文件的条目数量不同,我只能合并具有相同列的行。
此外,这3个CSV文件的路径必须作为参数传递。
我尝试在Java中做这个,但Python也可以!
我真的不知道该怎么做:(
英文:
I have 2 csv files first.csv and second.csv. They both have a shared column.
Example:
first : a b c d
second: x y a z
I have to create a third csv file that looks like this:
third : a b c d x y z
The files do not have the same number of entries, I must only merge the lines that share the same column.
Also the paths of the 3 csv files must be sent as parameters.
I was trying to do this in Java but Python would also work!
I don't really know what I should do
答案1
得分: 1
如果它们始终只有一个共享列,并且您想合并具有该列中相同值的记录(行),那么以下代码可能对您有所帮助:
```python
import pandas as pd
def merge_csv_files(first_file_path, second_file_path, output_file_path):
first_df = pd.read_csv(first_file_path)
second_df = pd.read_csv(second_file_path)
shared_column = set(first_df.columns) & set(second_df.columns)
# 寻找确切的一个共享列
if len(shared_column) != 1:
raise ValueError("CSV 文件没有恰好一个共享列。")
shared_column = shared_column.pop()
merged_df = pd.merge(first_df, second_df, on=shared_column, how='inner')
merged_df.to_csv(output_file_path, index=False)
first_file_path = 'first.csv'
second_file_path = 'second.csv'
output_file_path = 'third.csv'
merge_csv_files(first_file_path, second_file_path, output_file_path)
英文:
If they will always have exactly 1 shared column and you want to merge the records (lines) that have the same value in that column, then the following code might help you:
import pandas as pd
def merge_csv_files(first_file_path, second_file_path, output_file_path):
first_df = pd.read_csv(first_file_path)
second_df = pd.read_csv(second_file_path)
shared_column = set(first_df.columns) & set(second_df.columns)
# look for exactly 1 shared column
if len(shared_column) != 1:
raise ValueError("The CSV files do not have exactly one shared column.")
shared_column = shared_column.pop()
merged_df = pd.merge(first_df, second_df, on=shared_column, how='inner')
merged_df.to_csv(output_file_path, index=False)
first_file_path = 'first.csv'
second_file_path = 'second.csv'
output_file_path = 'third.csv'
merge_csv_files(first_file_path, second_file_path, output_file_path)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论