2023年5月10日 20:37:25go评论108阅读模式

英文:

How to import all csv files from one file in chronological order with python?

问题

I have around 2000 CSV files in my folder. I want to read them in in their chronological order. They are named with numbers so it must be easy I thought.

我有大约2000个CSV文件在我的文件夹中。我想按照它们的时间顺序读取它们。它们以数字命名，所以我认为这应该很容易。

I am reading them in with this following code. I can imagine a very simple solution since there must be an easy parameter for that. But I havent found anything :(((

我正在使用以下代码读取它们。我可以想象有一个非常简单的解决方案，因为肯定有一个简单的参数可以做到这一点。但是我还没有找到任何信息：(((

def csv_to_df():
    dff_all_from_csv = []
    
    for root, dirs, files in os.walk("output/csv_files"):
        for file in files:
            df = pd.read_csv(os.path.join(root, file))
            dff_all_from_csv.append(df)
    return dff_all_from_csv

这是我的代码，我正在使用它来读取这些文件。我认为应该有一个简单的解决方案，因为肯定有一个易于使用的参数来实现这个目标。但是我还没有找到任何信息：(((

英文:

I have around 2000 CSV files in my folder. I want to read them in in their chronological order. They are named with numbers so it must be easy I thought.

I am reading them in with this following code. I can imagine a very simple solution since there must be an easy parameter for that. But I havent found anything :(((

def csv_to_df():
    dff_all_from_csv = []
    
    for root, dirs, files in os.walk(&quot;output/csv_files&quot;):
        for file in files:
            df = pd.read_csv(os.path.join(root, file))
            dff_all_from_csv.append(df)
    return dff_all_from_csv

答案1

得分: 2

你可以split filename，然后使用 stem/number 作为 sorting 的 key：

def csv_to_df():
    dff_all_from_csv = []
    
    for root, dirs, files in os.walk("output/csv_files"):
        for file in sorted(files, key=lambda x: int(x.split(".")[0])): # <- line updated
            df = pd.read_csv(os.path.join(root, file))
            dff_all_from_csv.append(df)
    return dff_all_from_csv

或者使用 natsorted 来自 [tag:natsort]：

#pip install natsort
from natsort import natsorted
...
for root, dirs, files in os.walk("output/csv_files"):
    for file in natsorted(files): # <- line updated
        ...

英文:

You can split the filename and use the stem/number as a sorting key :

def csv_to_df():
    dff_all_from_csv = []
    
    for root, dirs, files in os.walk(&quot;output/csv_files&quot;):
        for file in sorted(files, key=lambda x: int(x.split(&quot;.&quot;)[0])): # &lt;- line updated
            df = pd.read_csv(os.path.join(root, file))
            dff_all_from_csv.append(df)
    return dff_all_from_csv

Or use natsorted from [tag:natsort] :

#pip install natsort
from natsort import natsorted
    ...
    for root, dirs, files in os.walk(&quot;output/csv_files&quot;):
        for file in natsorted(files): # &lt;- line updated
        ...

答案2

得分: 0

你可以尝试：

column_df = pd.read_csv(r'1.csv')
column_df.columns
all_csv_df = pd.DataFrame(columns=column_df.columns)
for i in range(1,5):
    r = pd.read_csv(r''+str(i)+'.csv')
    all_csv_df = all_csv_df.append(r)
    
all_csv_df

英文:

you can try:

column_df = pd.read_csv(r&#39;1.csv&#39;)
column_df.columns
all_csv_df = pd.DataFrame(columns=column_df.columns)
for i in range(1,5):
    r = pd.read_csv(r&#39;&#39;+str(i)+&#39;.csv&#39;)
    all_csv_df = all_csv_df.append(r)
    
all_csv_df

答案3

得分: 0

你可以使用 pathlib 和 lstat 属性来按创建时间 (st_ctime) 或修改时间 (st_mtime) 对文件进行排序：

import pathlib
DATA_DIR = 'output/csv_files'
dff_all_from_csv = [pd.read_csv(f) for f in sorted(DATA_DIR.glob('*.csv'),
                                                   key=lambda x: x.lstat().st_mtime)]

英文:

You can use pathlib and lstat attribute to sort your file by creation time (st_ctime) or modification time (st_mtime):

import pathlib
DATA_DIR = &#39;output/csv_files&#39;
dff_all_from_csv = [pd.read_csv(f) for f in sorted(DATA_DIR.glob(&#39;*.csv&#39;),
                                                   key=lambda x: x.lstat().st_mtime)]

答案4

得分: 0

您可以使用 os.path.getmtime() 获取csv文件的日期。您可以将创建日期添加到一个列表中，然后可以从排序后的列表中打开数据框架。

import os
import time
import pandas as pd
path_to_csv_files = "./csv_files/"
# 存储每个csv文件的名称和最后修改日期的元组列表
metadata = list()
for _, _, files in os.walk("./csv_files"):
   for name in files:
      # 检索最后修改日期并将其格式化为可按数字排序的形式
      creation_date = time.strftime("%Y%m%d%H%M%S", time.gmtime(os.path.getmtime(f"{path_to_csv_files}{name}")))
      # 将其转换为整数，以便我们可以按日期对元数据进行排序
      creation_date = int(creation_date)
      metadata.append((name, creation_date))
# 按日期对元数据进行排序
metadata = sorted(
    metadata, 
    key=lambda x: x[1]
    )
# 按日期顺序放置的数据框架列表
list_of_df_from_csv = list()
for name, _ in metadata:
   path_to_csv = path_to_csv_files + name
   df = pd.read_csv(path_to_csv)
   list_of_df_from_csv.append(df)

英文:

You can retrieve the date of a csv file using os.path.getmtime(). You can add the creation dates into a list that you can sort. Then you can open the dataframes from the sorted list.

import os
import time
import pandas as pd
path_to_csv_files = &quot;./csv_files/&quot;
# list in which we&#39;ll store the name and the last modification date of each csv file
metadata = list()
for _, _, files in os.walk(&quot;./csv_files&quot;):
   for name in files:
      # retrieving the last modif date and formating it so it is is numerically sortable
      creation_date = time.strftime(&quot;%Y%m%d%H%M%S&quot;,time.gmtime(os.path.getmtime(f&quot;{path_to_csv_files}{name}&quot;)))
      # turing it into an int so we can sort the metadata per date
      creation_date = int(creation_date)
      metadata.append((name, creation_date))
# sorting the metadata per date
metadata = sorted(
    metadata, 
    key=lambda x: x[1]
    )
# list of dataframes placed in date order
list_of_df_from_csv = list()
for name, _ in metadata:
   path_to_csv = path_to_csv_files+name
   df = pd.read_csv(path_to_csv)
   list_of_df_from_csv.append(df)

答案5

得分: 0

我尝试了类似这样的方法，它完美运行：

import os
import pandas as pd
def csv_to_df():
    
    folder_path = "output/csv_files"
    
    files = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.endswith('.csv')]
    files = sorted(files, key=os.path.getmtime)
    
    dff_all_from_csv = []
    for file in files:
        df = pd.read_csv(file)
        dff_all_from_csv.append(df)
    
    return dff_all_from_csv

英文:

I tried something like this and it works perfect:

import os
import pandas as pd
def csv_to_df():
    
    folder_path = &quot;output/csv_files&quot;
    
    files = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.endswith(&#39;.csv&#39;)]
    files = sorted(files, key=os.path.getmtime)
    
    dff_all_from_csv = []
    for file in files:
        df = pd.read_csv(file)
        dff_all_from_csv.append(df)
    
    return dff_all_from_csv

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使用Python按时间顺序从一个文件中导入所有CSV文件？

问题

答案1

答案2

答案3

答案4

答案5

PySpark多条件筛选

Airflow安装后找不到主模块

在pyspark中使用Params。

从另一个列表中的索引中移除列表中的元素在Python中。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论