如何从文件夹中读取特定名称的多个CSV文件并合并它们?

huangapple go评论68阅读模式
英文:

How to read multiple csv files with specific name from a folder and merge them?

问题

我正在尝试从具有特定名称的文件夹中读取多个文件(1.car.csv、2.car.csv等),并尝试在每次迭代的最右边添加一个新标签,然后将所有CSV文件合并为一个CSV文件。由于“.car.csv”是常量,我认为可以使用for循环与.format(index)函数来遍历CSV文件。所有CSV文件具有相同的属性。

请帮助我!

英文:

I am trying to read multiple files from a folder with specific name (1.car.csv, 2.car.csv and so on) and trying to add a new label after each iteration at right most of the dataset and merge all the csv files into one csv file. As the ".car.csv" is constant, I think I can use a for loop with .format(index) function to run over the csv files. All of the csv files has got same attributes.

Kindly help me!

答案1

得分: 2

  • glob 用于获取与模式 *.csv 匹配的文件夹中的所有文件。
  • pd.read_csv 用于将每个文件读取为一个 DataFrame
    • index_col=None 告诉 Pandas 不使用任何列作为索引,而是为 DataFrame 创建一个默认索引。
    • header=0 告诉 Pandas 使用 CSV 文件的第一行作为标题行。
  • pd.concat 用于将所有的 DataFrame 合并成一个名为 merged_df 的单个 DataFrame
    • axis=0 表示合并沿着行(垂直方向)进行。
    • ignore_index=True 执行合并,以丢弃各个 DataFrame 的原始索引,并为结果的 DataFrame 创建一个新的默认索引。
import glob
import pandas as pd

path = r'<包含csv文件的文件夹路径>'
all_files = glob.glob(path + "/*.csv")

lst = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    lst.append(df)

merged_df = pd.concat(lst, axis=0, ignore_index=True)
英文:
  • glob is used to get all files in the folder that match the pattern *.csv
  • pd.read_csv is used to read each file as a DataFrame
    • index_col=None you are telling Pandas to not use any of the columns as the index, and instead to create a default index for the DataFrame.
    • header=0 you are telling Pandas to use the first row of the CSV file as the header row.
  • pd.concat is used to merge all the DataFrames into a single DataFrame merged_df
    • axis=0 means that the concatenation should happen along the rows (vertically)
    • ignore_index=True the concatenation is performed such that the original indices of the individual DataFrames are discarded, and a new default index is created for the resulting DataFrame.
import glob
import pandas as pd

path = r&#39;&lt;path to folder containing csv files&gt;&#39;
all_files = glob.glob(path + &quot;/*.csv&quot;)

lst = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    lst.append(df)

merged_df = pd.concat(lst, axis=0, ignore_index=True)

答案2

得分: 0

这可以很容易地使用 CSV 工具如 miller 来完成:

mlr --csv cat --filename bla1.csv *.car.csv

这将连接这些文件(不重复包含标题行)并在第一列加入文件名。

英文:

This can be easily done with a CSV tool like miller:

mlr --csv cat --filename bla1.csv *.car.csv

This will concatenate the files (without repeating the header) and prepend the filename as the first column.

答案3

得分: 0

使用pathlib和pandas,您可以使用.assign()方法添加新列,最后使用.concat()方法将所有文件连接在一起。

from pathlib import Path
import pandas as pd

input_path = Path("path/to/car/files/").glob("*car.csv")
output_path = "path/to/output"

pd.concat(
    (pd.read_csv(x).assign(new_label="new data") for x in input_path), ignore_index=True
).to_csv(f"{output_path}/final.csv", index=False)
英文:

Using pathlib and pandas you can use .assign() to enter the new column and finally .concat() to concatenate all the files into one.

from pathlib import Path

import pandas as pd


input_path = Path(&quot;path/to/car/files/&quot;).glob(&quot;*car.csv&quot;)
output_path = &quot;path/to/output&quot;

pd.concat(
    (pd.read_csv(x).assign(new_label=&quot;new data&quot;) for x in input_path), ignore_index=True
).to_csv(f&quot;{output_path}/final.csv&quot;, index=False)

huangapple
  • 本文由 发表于 2023年2月6日 05:58:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75355804.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定