2023年5月22日 08:26:18go评论107阅读模式

英文:

How can we merge column headers from multiple CSVs into one dataframe, and list source file names for each file in one column?

问题

以下是您要翻译的代码部分：

# 导入必要的库
import pandas as pd
import os
import glob
# 使用 glob 获取文件夹中的所有 CSV 文件
path = 'C:\\Users\\'
csv_files = glob.glob(os.path.join(path, "*.csv"))
csv_files
df_headers = pd.DataFrame()
# 遍历 CSV 文件列表
for f in csv_files:
    # 读取 CSV 文件
    df = pd.read_csv(f, nrows=1)
    print(df.shape)
    df_headers = pd.concat([df_headers, df], axis=0)
    df_headers['file_name'] = f
df_headers.to_csv('C:\\Users\\ryans\\Desktop\\out.csv')

请注意，代码中的注释已被翻译成中文。

英文:

Here is the code that I am testing.

# import necessary libraries
import pandas as pd
import os
import glob
  
  
# use glob to get all the csv files 
# in the folder
path = &#39;C:\\Users\\&#39;
csv_files = glob.glob(os.path.join(path, &quot;*.csv&quot;))
csv_files
df_headers = pd.DataFrame()
# loop over the list of csv files
for f in csv_files:
    #print(type(f))
    # read the csv file
    df = pd.read_csv(f, nrows=1)
    print(df.shape)
    
    
    df_headers = pd.concat([df_headers, df], axis=0)
    df_headers[&#39;file_name&#39;] = f
      
df_headers.to_csv(&#39;C:\\Users\\ryans\\Desktop\\out.csv&#39;)

This almost works, but it always writes the last file to the column in df_headers['file_name'], so only the last file that the loop goes through, is actually listed in 'file_name'.

答案1

得分: 1

以下是您提供的内容的翻译部分：

这是因为每当您将单个值分配给整个列时，相同的值会重复出现。

例如，如果您有您的CSV文件如下 -

csv_files = ['1.csv', '2.csv', '3.csv']

第一次迭代

您的数据框将会是这样的

|某些列......|'filename'|
|某些值......|'1.csv'|

第二次迭代

您的数据框将会是这样的

|某些列......|'filename'|
|某些值......|'2.csv'|
|某些值......|'2.csv'|

以此类推。

当您将单个值分配给一列时，相同的值将分配给该列中的所有值。简而言之，如果您有一个数据框如下 -

|A|B|
|1|2|
|3|4|

如果您执行

df['B'] = 5

则B中的所有值都将变为5，因此您的数据框将变为

|A|B|
|1|5|
|3|5|

对于您的情况，一种解决方法可能是 -

导入必要的库

import pandas as pd
import os
import glob

使用glob获取文件夹中的所有CSV文件

path = 'C:\Users\'
csv_files = glob.glob(os.path.join(path, "*.csv"))
csv_files

df_headers = pd.DataFrame()

遍历CSV文件列表

for f in csv_files:
# 读取CSV文件
df = pd.read_csv(f, nrows=1)
print(df.shape)

df_headers = pd.concat([df_headers, df], axis=0)

df_headers['file_name'] = csv_files

df_headers.to_csv('C:\Users\ryans\Desktop\out.csv')

英文:

It is because whenever you assign a single value to a whole column, the same value gets repeated.

For example, if you have your csv files like -

csv_files = ['1.csv', '2.csv', '3.csv']

Ist Iteration

Your df will be like

IInd Iteration

Your df will be like

and so on.

When you are assigning a single value to a column, the same value gets assigned to all the values in the column. Simply put, a dataframe like-

|A|B| 
|1|2| 
|3|4|

if you do,

df['B'] = 5
, all values in B will become 5, therefore your dataframe becomes-

|A|B| 
|1|5| 
|3|5|

A solution to your case could be-

# import necessary libraries
import pandas as pd
import os
import glob
  
  
# use glob to get all the csv files 
# in the folder
path = &#39;C:\\Users\\&#39;
csv_files = glob.glob(os.path.join(path, &quot;*.csv&quot;))
csv_files
df_headers = pd.DataFrame()
# loop over the list of csv files
for f in csv_files:
    #print(type(f))
    # read the csv file
    df = pd.read_csv(f, nrows=1)
    print(df.shape)
    
    
    df_headers = pd.concat([df_headers, df], axis=0)
df_headers[&#39;file_name&#39;] = csv_files
      
df_headers.to_csv(&#39;C:\\Users\\ryans\\Desktop\\out.csv&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How can we merge column headers from multiple CSVs into one dataframe, and list source file names for each file in one column?

问题

答案1

导入必要的库

使用glob获取文件夹中的所有CSV文件

遍历CSV文件列表

我想下载我在Pinterest上的所有板块的图片

有人能解释一下为什么排序函数返回 “None” 吗？

如何在Flask中登录后重定向到受保护的URL？

通过列的组合删除具有对称值的行

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。