英文:
Use unique groups to remove files from pathway
问题
以下是您要翻译的代码部分:
I have `files` with different `dates` but same `tags` per group. From these, I want to **keep only the most recent file** from **each group**. In code terms, I can achieve this with a `dictionary` where these tags are turned into `keys`. However, in the pathway only the most recent file of one group remains there.
test = ['Group_2020-01-03_ABC_Blue_2018-12-18.csv',
'Group_2020-01-13_ABC_Blue_2018-12-18.csv',
'Group_2020-01-24_ABC_Blue_2018-12-18.csv',
'Group_2020-01-03_DEF_Red_2019-01-30.csv',
'Group_2020-01-13_DEF_Red_2019-01-30.csv',
'Group_2020-01-24_DEF_Red_2019-01-30.csv',
'Group_2020-01-03_GHI_Green_2019-03-28.csv',
'Group_2020-01-13_GHI_Green_2019-03-28.csv',
'Group_2020-01-24_GHI_Green_2019-03-28.csv']
dictionary = {}
for file in glob.glob(path + '*'): # or test
key = os.path.basename(file).split('_',2)[-1].split('.')[0]
group = dictionary.get(key,[])
group.append(os.path.basename(file))
dictionary[key] = group
希望这能帮助到您。如果您需要进一步的帮助,请随时告诉我。
英文:
I have files
with different dates
but same tags
per group. From these, I want to keep only the most recent file from each group. In code terms, I can achieve this with a dictionary
where these tags are turned into keys
. However, in the pathway only the most recent file of one group remains there.
test = ['Group_2020-01-03_ABC_Blue_2018-12-18.csv',
'Group_2020-01-13_ABC_Blue_2018-12-18.csv',
'Group_2020-01-24_ABC_Blue_2018-12-18.csv',
'Group_2020-01-03_DEF_Red_2019-01-30.csv',
'Group_2020-01-13_DEF_Red_2019-01-30.csv',
'Group_2020-01-24_DEF_Red_2019-01-30.csv',
'Group_2020-01-03_GHI_Green_2019-03-28.csv',
'Group_2020-01-13_GHI_Green_2019-03-28.csv',
'Group_2020-01-24_GHI_Green_2019-03-28.csv']
dictionary = {}
for file in glob.glob(path + '*'): # or test
key = os.path.basename(file).split('_',2)[-1].split('.')[0]
group = dictionary.get(key,[])
group.append(os.path.basename(file))
dictionary[key] = group
Which output is:
{'ABC_Blue_2018-12-18': ['Group_2020-01-03_ABC_Blue_2018-12-18.csv',
'Group_2020-01-13_ABC_Blue_2018-12-18.csv',
'Group_2020-01-24_ABC_Blue_2018-12-18.csv'],
'DEF_Red_2019-01-30': ['Group_2020-01-03_DEF_Red_2019-01-30.csv',
'Group_2020-01-13_DEF_Red_2019-01-30.csv',
'Group_2020-01-24_DEF_Red_2019-01-30.csv'],
'GHI_Green_2019-03-28': ['Group_2020-01-03_GHI_Green_2019-03-28.csv',
'Group_2020-01-13_GHI_Green_2019-03-28.csv',
'Group_2020-01-24_GHI_Green_2019-03-28.csv']}
When I want to remove those files from 2020-01-03
and 2020-01-13
, then there is only one from 2020-01-24
at the pathway instead of one per group. My understanding is that those groups do not exist at the pathway, then os.remove
just take one of them, but I cannot figure out how to make it do the same than inside in the dictionary.
for k,v in dictionary.items():
print(k)
for file in v:
print(file)
if os.path.join(path, file) != max(glob.glob(path + '*')):
test.remove(file)
# os.remove(os.path.join(path, file))
The printing of the key
and values
shows the groups properly assigned, and removing them happens as desirable.
ABC_Blue_2018-12-18
Group_2020-01-03_ABC_Blue_2018-12-18.csv
Group_2020-01-13_ABC_Blue_2018-12-18.csv
Group_2020-01-24_ABC_Blue_2018-12-18.csv
DEF_Red_2019-01-30
Group_2020-01-03_DEF_Red_2019-01-30.csv
Group_2020-01-13_DEF_Red_2019-01-30.csv
Group_2020-01-24_DEF_Red_2019-01-30.csv
GHI_Dekalb W_2019-03-28
Group_2020-01-03_GHI_Green_2019-03-28.csv
Group_2020-01-13_GHI_Green_2019-03-28.csv
Group_2020-01-24_GHI_Green_2019-03-28.csv
Result from LIST (desired):
['Group_2020-01-24_ABC_Blue_2018-12-18.csv',
'Group_2020-01-24_DEF_Red_2019-01-30.csv',
'Group_2020-01-24_GHI_Green_2019-03-28.csv']
Result from PATHWAY:
'Group_2020-01-24_GHI_Green_2019-03-28.csv'
Additionally, if I add glob.glob
to refer to the pathway, files are deleted but it prompts an error looking for the file that was just deleted. Running the code again keeps deleting files and with the same error.
dictionary = {}
for file in glob.glob(path + '*'):
key = os.path.basename(file).split('_',2)[-1].split('.')[0]
group = dictionary.get(key,[])
group.append(os.path.basename(file))
dictionary[key] = group
for k,v in dictionary.items():
if os.path.basename(file) != max(v):
os.remove(file)
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\path\\Group_2020-01-03_GHI_Green_2018-12-18.csv'
答案1
得分: 0
以下是您要翻译的代码部分:
import os
from datetime import datetime
directory = "path"
# create a dictionary to store the most recent files from each group
most_recent_files = {}
for file in os.listdir(directory):
file = str(file)[:-4]
file_parts = file.split("_") # split to extract date and group name
file_date = datetime.strptime(file_parts[1], "%Y-%m-%d") # convert to datetime object
group_name = "_".join(file_parts[2:5])
file = file + '.csv'
# update dictionary with most recent file for each group
if group_name in most_recent_files:
if file_date > most_recent_files[group_name][0]:
os.remove(os.path.join(directory, most_recent_files[group_name][1])) # remove older file
most_recent_files[group_name] = (file_date, file) # update most recent file
else:
os.remove(os.path.join(directory, file)) # remove current, older, file
else:
most_recent_files[group_name] = (file_date, file) # add first file to dictionary
# print recent files from each group
for group, file_info in most_recent_files.items():
print(f"{group}: {file_info[1]}")
英文:
I managed to get the following functional code.
As desired, it removes the older files based on the date that appears first in the title of each file. In this way, if you update the pathway with newer files with the same tag, then you would remove those that are not the most recent anymore.
import os
from datetime import datetime
directory = "path"
# create a dictionary to store the most recent files from each group
most_recent_files = {}
for file in os.listdir(directory):
file = str(file)[:-4]
file_parts = file.split("_") # split to extract date and group name
file_date = datetime.strptime(file_parts[1], "%Y-%m-%d") # convert to datetime object
group_name = "_".join(file_parts[2:5])
file = file + '.csv'
# update dictionary with most recent file for each group
if group_name in most_recent_files:
if file_date > most_recent_files[group_name][0]:
os.remove(os.path.join(directory, most_recent_files[group_name][1])) # remove older file
most_recent_files[group_name] = (file_date, file) # update most recent file
else:
os.remove(os.path.join(directory, file)) # remove current, older, file
else:
most_recent_files[group_name] = (file_date, file) # add first file to dictionary
# print recent files from each group
for group, file_info in most_recent_files.items():
print(f"{group}: {file_info[1]}")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论