英文:
How to import all csv files from one file in chronological order with python?
问题
I have around 2000 CSV files in my folder. I want to read them in in their chronological order. They are named with numbers so it must be easy I thought.
我有大约2000个CSV文件在我的文件夹中。我想按照它们的时间顺序读取它们。它们以数字命名,所以我认为这应该很容易。
I am reading them in with this following code. I can imagine a very simple solution since there must be an easy parameter for that. But I havent found anything :(((
我正在使用以下代码读取它们。我可以想象有一个非常简单的解决方案,因为肯定有一个简单的参数可以做到这一点。但是我还没有找到任何信息:(((
def csv_to_df():
dff_all_from_csv = []
for root, dirs, files in os.walk("output/csv_files"):
for file in files:
df = pd.read_csv(os.path.join(root, file))
dff_all_from_csv.append(df)
return dff_all_from_csv
这是我的代码,我正在使用它来读取这些文件。我认为应该有一个简单的解决方案,因为肯定有一个易于使用的参数来实现这个目标。但是我还没有找到任何信息:(((
英文:
I have around 2000 CSV files in my folder. I want to read them in in their chronological order. They are named with numbers so it must be easy I thought.
I am reading them in with this following code. I can imagine a very simple solution since there must be an easy parameter for that. But I havent found anything :(((
def csv_to_df():
dff_all_from_csv = []
for root, dirs, files in os.walk("output/csv_files"):
for file in files:
df = pd.read_csv(os.path.join(root, file))
dff_all_from_csv.append(df)
return dff_all_from_csv
答案1
得分: 2
你可以split
filename,然后使用 stem/number 作为 sorting
的 key
:
def csv_to_df():
dff_all_from_csv = []
for root, dirs, files in os.walk("output/csv_files"):
for file in sorted(files, key=lambda x: int(x.split(".")[0])): # <- line updated
df = pd.read_csv(os.path.join(root, file))
dff_all_from_csv.append(df)
return dff_all_from_csv
或者使用 natsorted
来自 [tag:natsort]:
#pip install natsort
from natsort import natsorted
...
for root, dirs, files in os.walk("output/csv_files"):
for file in natsorted(files): # <- line updated
...
英文:
You can split
the filename and use the stem/number as a sorting
key
:
def csv_to_df():
dff_all_from_csv = []
for root, dirs, files in os.walk("output/csv_files"):
for file in sorted(files, key=lambda x: int(x.split(".")[0])): # <- line updated
df = pd.read_csv(os.path.join(root, file))
dff_all_from_csv.append(df)
return dff_all_from_csv
Or use natsorted
from [tag:natsort] :
#pip install natsort
from natsort import natsorted
...
for root, dirs, files in os.walk("output/csv_files"):
for file in natsorted(files): # <- line updated
...
答案2
得分: 0
你可以尝试:
column_df = pd.read_csv(r'1.csv')
column_df.columns
all_csv_df = pd.DataFrame(columns=column_df.columns)
for i in range(1,5):
r = pd.read_csv(r''+str(i)+'.csv')
all_csv_df = all_csv_df.append(r)
all_csv_df
英文:
you can try:
column_df = pd.read_csv(r'1.csv')
column_df.columns
all_csv_df = pd.DataFrame(columns=column_df.columns)
for i in range(1,5):
r = pd.read_csv(r''+str(i)+'.csv')
all_csv_df = all_csv_df.append(r)
all_csv_df
答案3
得分: 0
你可以使用 pathlib
和 lstat
属性来按创建时间 (st_ctime
) 或修改时间 (st_mtime
) 对文件进行排序:
import pathlib
DATA_DIR = 'output/csv_files'
dff_all_from_csv = [pd.read_csv(f) for f in sorted(DATA_DIR.glob('*.csv'),
key=lambda x: x.lstat().st_mtime)]
英文:
You can use pathlib
and lstat
attribute to sort your file by creation time (st_ctime
) or modification time (st_mtime
):
import pathlib
DATA_DIR = 'output/csv_files'
dff_all_from_csv = [pd.read_csv(f) for f in sorted(DATA_DIR.glob('*.csv'),
key=lambda x: x.lstat().st_mtime)]
答案4
得分: 0
您可以使用 os.path.getmtime()
获取csv文件的日期。您可以将创建日期添加到一个列表中,然后可以从排序后的列表中打开数据框架。
import os
import time
import pandas as pd
path_to_csv_files = "./csv_files/"
# 存储每个csv文件的名称和最后修改日期的元组列表
metadata = list()
for _, _, files in os.walk("./csv_files"):
for name in files:
# 检索最后修改日期并将其格式化为可按数字排序的形式
creation_date = time.strftime("%Y%m%d%H%M%S", time.gmtime(os.path.getmtime(f"{path_to_csv_files}{name}")))
# 将其转换为整数,以便我们可以按日期对元数据进行排序
creation_date = int(creation_date)
metadata.append((name, creation_date))
# 按日期对元数据进行排序
metadata = sorted(
metadata,
key=lambda x: x[1]
)
# 按日期顺序放置的数据框架列表
list_of_df_from_csv = list()
for name, _ in metadata:
path_to_csv = path_to_csv_files + name
df = pd.read_csv(path_to_csv)
list_of_df_from_csv.append(df)
英文:
You can retrieve the date of a csv file using os.path.getmtime()
. You can add the creation dates into a list that you can sort. Then you can open the dataframes from the sorted list.
import os
import time
import pandas as pd
path_to_csv_files = "./csv_files/"
# list in which we'll store the name and the last modification date of each csv file
metadata = list()
for _, _, files in os.walk("./csv_files"):
for name in files:
# retrieving the last modif date and formating it so it is is numerically sortable
creation_date = time.strftime("%Y%m%d%H%M%S",time.gmtime(os.path.getmtime(f"{path_to_csv_files}{name}")))
# turing it into an int so we can sort the metadata per date
creation_date = int(creation_date)
metadata.append((name, creation_date))
# sorting the metadata per date
metadata = sorted(
metadata,
key=lambda x: x[1]
)
# list of dataframes placed in date order
list_of_df_from_csv = list()
for name, _ in metadata:
path_to_csv = path_to_csv_files+name
df = pd.read_csv(path_to_csv)
list_of_df_from_csv.append(df)
答案5
得分: 0
我尝试了类似这样的方法,它完美运行:
import os
import pandas as pd
def csv_to_df():
folder_path = "output/csv_files"
files = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.endswith('.csv')]
files = sorted(files, key=os.path.getmtime)
dff_all_from_csv = []
for file in files:
df = pd.read_csv(file)
dff_all_from_csv.append(df)
return dff_all_from_csv
英文:
I tried something like this and it works perfect:
import os
import pandas as pd
def csv_to_df():
folder_path = "output/csv_files"
files = [os.path.join(folder_path, f) for f in os.listdir(folder_path) if f.endswith('.csv')]
files = sorted(files, key=os.path.getmtime)
dff_all_from_csv = []
for file in files:
df = pd.read_csv(file)
dff_all_from_csv.append(df)
return dff_all_from_csv
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论