英文:
How to merge multiple csv files?
问题
我有几个CSV文件,它们的第一行元素相同。例如:
csv-1.csv:
Value,0
Currency,0
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0
csv-2.csv:
Value,0
Currency,1
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0
有了这些文件(不止两个文件),我想要合并它们并创建类似以下的表格:
| left | csv-1 | csv-2 |
|:--------- |:-----:| -----:|
| Value | 0 | 0 |
| Currency | 0 | 1 |
| datetime | 0 | 0 |
你可以使用Python创建这个函数。以下是一个示例函数:
import pandas as pd
# 以文件名和列名作为参数创建一个数据帧,并将列名重命名为文件名
def create_merged_dataframe(file_paths):
dataframes = []
for file_path in file_paths:
df = pd.read_csv(file_path, index_col=0, header=None)
filename = file_path.split('/')[-1].split('.')[0] # 获取文件名
df = df.rename(columns={1: filename})
dataframes.append(df)
# 使用 reduce 函数逐个合并数据帧
merged_df = pd.concat(dataframes, axis=1)
return merged_df
# 传入文件路径列表
file_paths = ['csv-1.csv', 'csv-2.csv'] # 添加更多文件路径
merged_dataframe = create_merged_dataframe(file_paths)
# 打印合并后的数据帧
print(merged_dataframe)
你可以将更多的文件路径添加到file_paths
列表中以合并更多文件。这个函数将为你创建一个合并后的数据帧,其中每个文件的数据将按列的方式进行合并,并且列名将对应文件的名称。
英文:
I have several csv files that has same first row element in it.
For example:
csv-1.csv:
Value,0
Currency,0
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0
csv-2.csv:
Value,0
Currency,1
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0
And with these files (more than 2 files by the way) I want to merge them and create something like that:
left | csv-1 | csv-2 |
---|---|---|
Value | 0 | 0 |
Currency | 0 | 1 |
datetime | 0 | 0 |
How can I create this funtion in python?
答案1
得分: 2
首先,你必须通过首列在数据框中创建索引,以便进一步进行连接:
import pandas as pd
import numpy as np
df1 = pd.read_csv('csv-1.csv')
df2 = pd.read_csv('csv-2.csv')
df1 = df1.set_index('col1')
df2 = df2.set_index('col1')
df = df1.join(df2, how='outer')
然后,如果需要,可以重新命名列名或创建一个新的索引。
英文:
First, you must create indexes in dataframes by first columns, on which you will further join:
import pandas as pd
import numpy as np
df1 = pd.read_csv('csv-1.csv')
df2 = pd.read_csv('csv-2.csv')
df1 = df1.set_index('col1')
df2 = df2.set_index('col1')
df = df1.join(df2, how='outer')
Then rename the column names if needed, or make a new index
答案2
得分: 1
你可以使用以下代码:
import pandas as pd
import pathlib
out = (pd.concat([pd.read_csv(csvfile, header=None, index_col=[0], names=[csvfile.stem])
for csvfile in sorted(pathlib.Path.cwd().glob('*.csv'))], axis=1)
.rename_axis('left').reset_index())
输出结果如下:
out
left csv-1 csv-2
0 Value 0 0
1 Currency 0 1
2 datetime 0 0
3 Receiver 0 0
4 Beneficiary 0 0
5 Flag 0 0
6 idx 0 0
<details>
<summary>英文:</summary>
You can use:
import pandas as pd
import pathlib
out = (pd.concat([pd.read_csv(csvfile, header=None, index_col=[0], names=[csvfile.stem])
for csvfile in sorted(pathlib.Path.cwd().glob('*.csv'))], axis=1)
.rename_axis('left').reset_index())
Output:
>>> out
left csv-1 csv-2
0 Value 0 0
1 Currency 0 1
2 datetime 0 0
3 Receiver 0 0
4 Beneficiary 0 0
5 Flag 0 0
6 idx 0 0
答案3
得分: 1
以下是已翻译的内容:
这里是你可以做的事情
import pandas as pd
from glob import glob
def refineFilename(path):
return path.split(".")[0]
df=pd.DataFrame()
for file in glob("csv-*.csv"):
new=pd.read_csv(file,header=None,index_col=[0])
df[refineFilename(file)]=new[1]
df.reset_index(inplace=True)
df.rename(columns={0:"left"},inplace=True)
print(df)
我们在这里所做的是使df
变量存储所有数据,然后迭代所有文件,并将这些文件的第二列添加到df
中,列名以文件名命名。
英文:
Here's what you can do
import pandas as pd
from glob import glob
def refineFilename(path):
return path.split(".")[0]
df=pd.DataFrame()
for file in glob("csv-*.csv"):
new=pd.read_csv(file,header=None,index_col=[0])
df[refineFinename(file)]=new[1]
df.reset_index(inplace=True)
df.rename(columns={0:"left"},inplace=True)
print(df)
"""
left csv-1 csv-2
0 Value 0 0
1 Currency 0 1
2 datetime 0 0
3 Receiver 0 0
4 Beneficiary 0 0
5 Flag 0 0
6 idx 0 0
"""
What we are doing here is making the df
variable store all data, and then iterating through all files and adding a second column of those files to df
with file name as the column name.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论