如何合并多个CSV文件?

huangapple go评论90阅读模式
英文:

How to merge multiple csv files?

问题

我有几个CSV文件,它们的第一行元素相同。例如:

  1. csv-1.csv:
  2. Value,0
  3. Currency,0
  4. datetime,0
  5. Receiver,0
  6. Beneficiary,0
  7. Flag,0
  8. idx,0
  9. csv-2.csv:
  10. Value,0
  11. Currency,1
  12. datetime,0
  13. Receiver,0
  14. Beneficiary,0
  15. Flag,0
  16. idx,0

有了这些文件(不止两个文件),我想要合并它们并创建类似以下的表格:

  1. | left | csv-1 | csv-2 |
  2. |:--------- |:-----:| -----:|
  3. | Value | 0 | 0 |
  4. | Currency | 0 | 1 |
  5. | datetime | 0 | 0 |

你可以使用Python创建这个函数。以下是一个示例函数:

  1. import pandas as pd
  2. # 以文件名和列名作为参数创建一个数据帧,并将列名重命名为文件名
  3. def create_merged_dataframe(file_paths):
  4. dataframes = []
  5. for file_path in file_paths:
  6. df = pd.read_csv(file_path, index_col=0, header=None)
  7. filename = file_path.split('/')[-1].split('.')[0] # 获取文件名
  8. df = df.rename(columns={1: filename})
  9. dataframes.append(df)
  10. # 使用 reduce 函数逐个合并数据帧
  11. merged_df = pd.concat(dataframes, axis=1)
  12. return merged_df
  13. # 传入文件路径列表
  14. file_paths = ['csv-1.csv', 'csv-2.csv'] # 添加更多文件路径
  15. merged_dataframe = create_merged_dataframe(file_paths)
  16. # 打印合并后的数据帧
  17. print(merged_dataframe)

你可以将更多的文件路径添加到file_paths列表中以合并更多文件。这个函数将为你创建一个合并后的数据帧,其中每个文件的数据将按列的方式进行合并,并且列名将对应文件的名称。

英文:

I have several csv files that has same first row element in it.
For example:

  1. csv-1.csv:
  2. Value,0
  3. Currency,0
  4. datetime,0
  5. Receiver,0
  6. Beneficiary,0
  7. Flag,0
  8. idx,0
  9. csv-2.csv:
  10. Value,0
  11. Currency,1
  12. datetime,0
  13. Receiver,0
  14. Beneficiary,0
  15. Flag,0
  16. idx,0

And with these files (more than 2 files by the way) I want to merge them and create something like that:

left csv-1 csv-2
Value 0 0
Currency 0 1
datetime 0 0

How can I create this funtion in python?

答案1

得分: 2

首先,你必须通过首列在数据框中创建索引,以便进一步进行连接:

  1. import pandas as pd
  2. import numpy as np
  3. df1 = pd.read_csv('csv-1.csv')
  4. df2 = pd.read_csv('csv-2.csv')
  5. df1 = df1.set_index('col1')
  6. df2 = df2.set_index('col1')
  7. df = df1.join(df2, how='outer')

然后,如果需要,可以重新命名列名或创建一个新的索引。

英文:

First, you must create indexes in dataframes by first columns, on which you will further join:

  1. import pandas as pd
  2. import numpy as np
  3. df1 = pd.read_csv('csv-1.csv')
  4. df2 = pd.read_csv('csv-2.csv')
  5. df1 = df1.set_index('col1')
  6. df2 = df2.set_index('col1')
  7. df = df1.join(df2, how='outer')

Then rename the column names if needed, or make a new index

答案2

得分: 1

  1. 你可以使用以下代码:

import pandas as pd
import pathlib

out = (pd.concat([pd.read_csv(csvfile, header=None, index_col=[0], names=[csvfile.stem])
for csvfile in sorted(pathlib.Path.cwd().glob('*.csv'))], axis=1)
.rename_axis('left').reset_index())

  1. 输出结果如下:

out
left csv-1 csv-2
0 Value 0 0
1 Currency 0 1
2 datetime 0 0
3 Receiver 0 0
4 Beneficiary 0 0
5 Flag 0 0
6 idx 0 0

  1. <details>
  2. <summary>英文:</summary>
  3. You can use:

import pandas as pd
import pathlib

out = (pd.concat([pd.read_csv(csvfile, header=None, index_col=[0], names=[csvfile.stem])
for csvfile in sorted(pathlib.Path.cwd().glob('*.csv'))], axis=1)
.rename_axis('left').reset_index())

  1. Output:

>>> out
left csv-1 csv-2
0 Value 0 0
1 Currency 0 1
2 datetime 0 0
3 Receiver 0 0
4 Beneficiary 0 0
5 Flag 0 0
6 idx 0 0

答案3

得分: 1

以下是已翻译的内容:

这里是你可以做的事情

  1. import pandas as pd
  2. from glob import glob
  3. def refineFilename(path):
  4. return path.split(".")[0]
  5. df=pd.DataFrame()
  6. for file in glob("csv-*.csv"):
  7. new=pd.read_csv(file,header=None,index_col=[0])
  8. df[refineFilename(file)]=new[1]
  9. df.reset_index(inplace=True)
  10. df.rename(columns={0:"left"},inplace=True)
  11. print(df)

我们在这里所做的是使df变量存储所有数据,然后迭代所有文件,并将这些文件的第二列添加到df中,列名以文件名命名。

英文:

Here's what you can do

  1. import pandas as pd
  2. from glob import glob
  3. def refineFilename(path):
  4. return path.split(&quot;.&quot;)[0]
  5. df=pd.DataFrame()
  6. for file in glob(&quot;csv-*.csv&quot;):
  7. new=pd.read_csv(file,header=None,index_col=[0])
  8. df[refineFinename(file)]=new[1]
  9. df.reset_index(inplace=True)
  10. df.rename(columns={0:&quot;left&quot;},inplace=True)
  11. print(df)
  12. &quot;&quot;&quot;
  13. left csv-1 csv-2
  14. 0 Value 0 0
  15. 1 Currency 0 1
  16. 2 datetime 0 0
  17. 3 Receiver 0 0
  18. 4 Beneficiary 0 0
  19. 5 Flag 0 0
  20. 6 idx 0 0
  21. &quot;&quot;&quot;

What we are doing here is making the df variable store all data, and then iterating through all files and adding a second column of those files to df with file name as the column name. 

huangapple
  • 本文由 发表于 2023年2月14日 00:20:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/75438549.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定