2023年2月6日 05:58:17go评论105阅读模式

英文:

How to read multiple csv files with specific name from a folder and merge them?

问题

我正在尝试从具有特定名称的文件夹中读取多个文件（1.car.csv、2.car.csv等），并尝试在每次迭代的最右边添加一个新标签，然后将所有CSV文件合并为一个CSV文件。由于“.car.csv”是常量，我认为可以使用for循环与.format(index)函数来遍历CSV文件。所有CSV文件具有相同的属性。

请帮助我！

英文:

I am trying to read multiple files from a folder with specific name (1.car.csv, 2.car.csv and so on) and trying to add a new label after each iteration at right most of the dataset and merge all the csv files into one csv file. As the ".car.csv" is constant, I think I can use a for loop with .format(index) function to run over the csv files. All of the csv files has got same attributes.

Kindly help me!

答案1

得分: 2

glob 用于获取与模式 *.csv 匹配的文件夹中的所有文件。
pd.read_csv 用于将每个文件读取为一个 DataFrame
- index_col=None 告诉 Pandas 不使用任何列作为索引，而是为 DataFrame 创建一个默认索引。
- header=0 告诉 Pandas 使用 CSV 文件的第一行作为标题行。
pd.concat 用于将所有的 DataFrame 合并成一个名为 merged_df 的单个 DataFrame
- axis=0 表示合并沿着行（垂直方向）进行。
- ignore_index=True 执行合并，以丢弃各个 DataFrame 的原始索引，并为结果的 DataFrame 创建一个新的默认索引。

import glob
import pandas as pd
path = r'<包含csv文件的文件夹路径>'
all_files = glob.glob(path + "/*.csv")
lst = []
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    lst.append(df)
merged_df = pd.concat(lst, axis=0, ignore_index=True)

英文:

glob is used to get all files in the folder that match the pattern *.csv
pd.read_csv is used to read each file as a DataFrame
- index_col=None you are telling Pandas to not use any of the columns as the index, and instead to create a default index for the DataFrame.
- header=0 you are telling Pandas to use the first row of the CSV file as the header row.
pd.concat is used to merge all the DataFrames into a single DataFrame merged_df
- axis=0 means that the concatenation should happen along the rows (vertically)
- ignore_index=True the concatenation is performed such that the original indices of the individual DataFrames are discarded, and a new default index is created for the resulting DataFrame.

import glob
import pandas as pd
path = r&#39;&lt;path to folder containing csv files&gt;&#39;
all_files = glob.glob(path + &quot;/*.csv&quot;)
lst = []
for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    lst.append(df)
merged_df = pd.concat(lst, axis=0, ignore_index=True)

答案2

得分: 0

这可以很容易地使用 CSV 工具如 miller 来完成：

mlr --csv cat --filename bla1.csv *.car.csv

这将连接这些文件（不重复包含标题行）并在第一列加入文件名。

英文:

This can be easily done with a CSV tool like miller:

mlr --csv cat --filename bla1.csv *.car.csv

This will concatenate the files (without repeating the header) and prepend the filename as the first column.

答案3

得分: 0

使用pathlib和pandas，您可以使用.assign()方法添加新列，最后使用.concat()方法将所有文件连接在一起。

from pathlib import Path
import pandas as pd
input_path = Path("path/to/car/files/").glob("*car.csv")
output_path = "path/to/output"
pd.concat(
    (pd.read_csv(x).assign(new_label="new data") for x in input_path), ignore_index=True
).to_csv(f"{output_path}/final.csv", index=False)

英文:

Using pathlib and pandas you can use .assign() to enter the new column and finally .concat() to concatenate all the files into one.

from pathlib import Path
import pandas as pd
input_path = Path(&quot;path/to/car/files/&quot;).glob(&quot;*car.csv&quot;)
output_path = &quot;path/to/output&quot;
pd.concat(
    (pd.read_csv(x).assign(new_label=&quot;new data&quot;) for x in input_path), ignore_index=True
).to_csv(f&quot;{output_path}/final.csv&quot;, index=False)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从文件夹中读取特定名称的多个CSV文件并合并它们？

问题

答案1

答案2

答案3

具有参数的关键字作为参数

`RUN apt-key adv –keyserver keyserver.ubuntu.com –recv-keys 871920D1991BC93C` 返回错误。

How to get a formula of add > subtract > add > vice versa in this output?

有没有一种方法可以使用ffmpeg将RTSP音频流仅流式传输到标准输出(stdout)？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。