如何合并多个CSV文件?

huangapple go评论64阅读模式
英文:

How to merge multiple csv files?

问题

我有几个CSV文件,它们的第一行元素相同。例如:

csv-1.csv:
Value,0
Currency,0
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0

csv-2.csv:
Value,0
Currency,1
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0

有了这些文件(不止两个文件),我想要合并它们并创建类似以下的表格:

| left      | csv-1 | csv-2 |
|:--------- |:-----:| -----:|
| Value     |   0   |   0   |
| Currency  |   0   |   1   |
| datetime  |   0   |   0   |

你可以使用Python创建这个函数。以下是一个示例函数:

import pandas as pd

# 以文件名和列名作为参数创建一个数据帧,并将列名重命名为文件名
def create_merged_dataframe(file_paths):
    dataframes = []
    
    for file_path in file_paths:
        df = pd.read_csv(file_path, index_col=0, header=None)
        filename = file_path.split('/')[-1].split('.')[0]  # 获取文件名
        df = df.rename(columns={1: filename})
        dataframes.append(df)
    
    # 使用 reduce 函数逐个合并数据帧
    merged_df = pd.concat(dataframes, axis=1)
    
    return merged_df

# 传入文件路径列表
file_paths = ['csv-1.csv', 'csv-2.csv']  # 添加更多文件路径
merged_dataframe = create_merged_dataframe(file_paths)

# 打印合并后的数据帧
print(merged_dataframe)

你可以将更多的文件路径添加到file_paths列表中以合并更多文件。这个函数将为你创建一个合并后的数据帧,其中每个文件的数据将按列的方式进行合并,并且列名将对应文件的名称。

英文:

I have several csv files that has same first row element in it.
For example:

csv-1.csv:
Value,0
Currency,0
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0

csv-2.csv:
Value,0
Currency,1
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0

And with these files (more than 2 files by the way) I want to merge them and create something like that:

left csv-1 csv-2
Value 0 0
Currency 0 1
datetime 0 0

How can I create this funtion in python?

答案1

得分: 2

首先,你必须通过首列在数据框中创建索引,以便进一步进行连接:

import pandas as pd
import numpy as np

df1 = pd.read_csv('csv-1.csv')
df2 = pd.read_csv('csv-2.csv')

df1 = df1.set_index('col1')
df2 = df2.set_index('col1')

df = df1.join(df2, how='outer')

然后,如果需要,可以重新命名列名或创建一个新的索引。

英文:

First, you must create indexes in dataframes by first columns, on which you will further join:

import pandas as pd
import numpy as np

df1 = pd.read_csv('csv-1.csv')
df2 = pd.read_csv('csv-2.csv')

df1 = df1.set_index('col1')
df2 = df2.set_index('col1')

df = df1.join(df2, how='outer')

Then rename the column names if needed, or make a new index

答案2

得分: 1

你可以使用以下代码:

import pandas as pd
import pathlib

out = (pd.concat([pd.read_csv(csvfile, header=None, index_col=[0], names=[csvfile.stem])
for csvfile in sorted(pathlib.Path.cwd().glob('*.csv'))], axis=1)
.rename_axis('left').reset_index())


输出结果如下:

out
left csv-1 csv-2
0 Value 0 0
1 Currency 0 1
2 datetime 0 0
3 Receiver 0 0
4 Beneficiary 0 0
5 Flag 0 0
6 idx 0 0


<details>
<summary>英文:</summary>

You can use:

import pandas as pd
import pathlib

out = (pd.concat([pd.read_csv(csvfile, header=None, index_col=[0], names=[csvfile.stem])
for csvfile in sorted(pathlib.Path.cwd().glob('*.csv'))], axis=1)
.rename_axis('left').reset_index())


Output:

>>> out
left csv-1 csv-2
0 Value 0 0
1 Currency 0 1
2 datetime 0 0
3 Receiver 0 0
4 Beneficiary 0 0
5 Flag 0 0
6 idx 0 0

答案3

得分: 1

以下是已翻译的内容:

这里是你可以做的事情

import pandas as pd
from glob import glob

def refineFilename(path):
    return path.split(".")[0]

df=pd.DataFrame()

for file in glob("csv-*.csv"):
    new=pd.read_csv(file,header=None,index_col=[0])
    df[refineFilename(file)]=new[1]

df.reset_index(inplace=True)
df.rename(columns={0:"left"},inplace=True)

print(df)

我们在这里所做的是使df变量存储所有数据,然后迭代所有文件,并将这些文件的第二列添加到df中,列名以文件名命名。

英文:

Here's what you can do

import pandas as pd
from glob import glob

def refineFilename(path):
	return path.split(&quot;.&quot;)[0]

df=pd.DataFrame()

for file in glob(&quot;csv-*.csv&quot;):
	new=pd.read_csv(file,header=None,index_col=[0])
	df[refineFinename(file)]=new[1]

df.reset_index(inplace=True)
df.rename(columns={0:&quot;left&quot;},inplace=True)

print(df)

&quot;&quot;&quot;
          left  csv-1  csv-2
0        Value      0      0
1     Currency      0      1
2     datetime      0      0
3     Receiver      0      0
4  Beneficiary      0      0
5         Flag      0      0
6          idx      0      0
&quot;&quot;&quot;

What we are doing here is making the df variable store all data, and then iterating through all files and adding a second column of those files to df with file name as the column name. 

huangapple
  • 本文由 发表于 2023年2月14日 00:20:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75438549.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定