2023年2月14日 00:20:20go评论90阅读模式

英文:

How to merge multiple csv files?

问题

我有几个CSV文件，它们的第一行元素相同。例如：

csv-1.csv:
Value,0
Currency,0
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0
csv-2.csv:
Value,0
Currency,1
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0

有了这些文件（不止两个文件），我想要合并它们并创建类似以下的表格：

| left      | csv-1 | csv-2 |
|:--------- |:-----:| -----:|
| Value     |   0   |   0   |
| Currency  |   0   |   1   |
| datetime  |   0   |   0   |

你可以使用Python创建这个函数。以下是一个示例函数：

import pandas as pd
# 以文件名和列名作为参数创建一个数据帧，并将列名重命名为文件名
def create_merged_dataframe(file_paths):
    dataframes = []
    
    for file_path in file_paths:
        df = pd.read_csv(file_path, index_col=0, header=None)
        filename = file_path.split('/')[-1].split('.')[0]  # 获取文件名
        df = df.rename(columns={1: filename})
        dataframes.append(df)
    
    # 使用 reduce 函数逐个合并数据帧
    merged_df = pd.concat(dataframes, axis=1)
    
    return merged_df
# 传入文件路径列表
file_paths = ['csv-1.csv', 'csv-2.csv']  # 添加更多文件路径
merged_dataframe = create_merged_dataframe(file_paths)
# 打印合并后的数据帧
print(merged_dataframe)

你可以将更多的文件路径添加到file_paths列表中以合并更多文件。这个函数将为你创建一个合并后的数据帧，其中每个文件的数据将按列的方式进行合并，并且列名将对应文件的名称。

英文:

I have several csv files that has same first row element in it.
For example:

csv-1.csv:
Value,0
Currency,0
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0
csv-2.csv:
Value,0
Currency,1
datetime,0
Receiver,0
Beneficiary,0
Flag,0
idx,0

And with these files (more than 2 files by the way) I want to merge them and create something like that:

left	csv-1	csv-2
Value	0	0
Currency	0	1
datetime	0	0

How can I create this funtion in python?

答案1

得分: 2

首先，你必须通过首列在数据框中创建索引，以便进一步进行连接：

import pandas as pd
import numpy as np
df1 = pd.read_csv('csv-1.csv')
df2 = pd.read_csv('csv-2.csv')
df1 = df1.set_index('col1')
df2 = df2.set_index('col1')
df = df1.join(df2, how='outer')

然后，如果需要，可以重新命名列名或创建一个新的索引。

英文:

First, you must create indexes in dataframes by first columns, on which you will further join:

import pandas as pd
import numpy as np
df1 = pd.read_csv(&#39;csv-1.csv&#39;)
df2 = pd.read_csv(&#39;csv-2.csv&#39;)
df1 = df1.set_index(&#39;col1&#39;)
df2 = df2.set_index(&#39;col1&#39;)
df = df1.join(df2, how=&#39;outer&#39;)

Then rename the column names if needed, or make a new index

答案2

得分: 1

你可以使用以下代码：

import pandas as pd
import pathlib

out = (pd.concat([pd.read_csv(csvfile, header=None, index_col=[0], names=[csvfile.stem])
for csvfile in sorted(pathlib.Path.cwd().glob('*.csv'))], axis=1)
.rename_axis('left').reset_index())


输出结果如下：

out
left csv-1 csv-2
0 Value 0 0
1 Currency 0 1
2 datetime 0 0
3 Receiver 0 0
4 Beneficiary 0 0
5 Flag 0 0
6 idx 0 0


<details>
<summary>英文:</summary>
You can use:

import pandas as pd
import pathlib

out = (pd.concat([pd.read_csv(csvfile, header=None, index_col=[0], names=[csvfile.stem])
for csvfile in sorted(pathlib.Path.cwd().glob('*.csv'))], axis=1)
.rename_axis('left').reset_index())


Output:

>>> out
left csv-1 csv-2
0 Value 0 0
1 Currency 0 1
2 datetime 0 0
3 Receiver 0 0
4 Beneficiary 0 0
5 Flag 0 0
6 idx 0 0

答案3

得分: 1

以下是已翻译的内容：

这里是你可以做的事情

import pandas as pd
from glob import glob
def refineFilename(path):
    return path.split(".")[0]
df=pd.DataFrame()
for file in glob("csv-*.csv"):
    new=pd.read_csv(file,header=None,index_col=[0])
    df[refineFilename(file)]=new[1]
df.reset_index(inplace=True)
df.rename(columns={0:"left"},inplace=True)
print(df)

我们在这里所做的是使df变量存储所有数据，然后迭代所有文件，并将这些文件的第二列添加到df中，列名以文件名命名。

英文:

Here's what you can do

import pandas as pd
from glob import glob
def refineFilename(path):
	return path.split(&quot;.&quot;)[0]
df=pd.DataFrame()
for file in glob(&quot;csv-*.csv&quot;):
	new=pd.read_csv(file,header=None,index_col=[0])
	df[refineFinename(file)]=new[1]
df.reset_index(inplace=True)
df.rename(columns={0:&quot;left&quot;},inplace=True)
print(df)
&quot;&quot;&quot;
          left  csv-1  csv-2
0        Value      0      0
1     Currency      0      1
2     datetime      0      0
3     Receiver      0      0
4  Beneficiary      0      0
5         Flag      0      0
6          idx      0      0
&quot;&quot;&quot;

What we are doing here is making the df variable store all data, and then iterating through all files and adding a second column of those files to df with file name as the column name.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何合并多个CSV文件？

问题

答案1

答案2

答案3

Keras用于创建CNN – 数组大小加倍训练图像数量

经过 for 循环特定次数，类似于 Python 中的 range() 函数。

Pandas滚动应用以意外方式返回NaN。

如何在Python中使用datetime进行减法操作

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。