使用唯一的分组从路径中移除文件。

huangapple go评论66阅读模式
英文:

Use unique groups to remove files from pathway

问题

以下是您要翻译的代码部分:

I have `files` with different `dates` but same `tags` per group. From these, I want to **keep only the most recent file** from **each group**. In code terms, I can achieve this with a `dictionary` where these tags are turned into `keys`. However, in the pathway only the most recent file of one group remains there.

    test = ['Group_2020-01-03_ABC_Blue_2018-12-18.csv',
    'Group_2020-01-13_ABC_Blue_2018-12-18.csv',
    'Group_2020-01-24_ABC_Blue_2018-12-18.csv',
    'Group_2020-01-03_DEF_Red_2019-01-30.csv',
    'Group_2020-01-13_DEF_Red_2019-01-30.csv',
    'Group_2020-01-24_DEF_Red_2019-01-30.csv',
    'Group_2020-01-03_GHI_Green_2019-03-28.csv',
    'Group_2020-01-13_GHI_Green_2019-03-28.csv',
    'Group_2020-01-24_GHI_Green_2019-03-28.csv']

    dictionary = {}
    for file in glob.glob(path + '*'): # or test
        key = os.path.basename(file).split('_',2)[-1].split('.')[0]
        group = dictionary.get(key,[])
        group.append(os.path.basename(file))  
        dictionary[key] = group

希望这能帮助到您。如果您需要进一步的帮助,请随时告诉我。

英文:

I have files with different dates but same tags per group. From these, I want to keep only the most recent file from each group. In code terms, I can achieve this with a dictionary where these tags are turned into keys. However, in the pathway only the most recent file of one group remains there.

test = ['Group_2020-01-03_ABC_Blue_2018-12-18.csv',
'Group_2020-01-13_ABC_Blue_2018-12-18.csv',
'Group_2020-01-24_ABC_Blue_2018-12-18.csv',
'Group_2020-01-03_DEF_Red_2019-01-30.csv',
'Group_2020-01-13_DEF_Red_2019-01-30.csv',
'Group_2020-01-24_DEF_Red_2019-01-30.csv',
'Group_2020-01-03_GHI_Green_2019-03-28.csv',
'Group_2020-01-13_GHI_Green_2019-03-28.csv',
'Group_2020-01-24_GHI_Green_2019-03-28.csv']

dictionary = {}
for file in glob.glob(path + '*'): # or test
    key = os.path.basename(file).split('_',2)[-1].split('.')[0]
    group = dictionary.get(key,[])
    group.append(os.path.basename(file))  
    dictionary[key] = group

Which output is:

{'ABC_Blue_2018-12-18': ['Group_2020-01-03_ABC_Blue_2018-12-18.csv', 
    'Group_2020-01-13_ABC_Blue_2018-12-18.csv',
    'Group_2020-01-24_ABC_Blue_2018-12-18.csv'],
 'DEF_Red_2019-01-30': ['Group_2020-01-03_DEF_Red_2019-01-30.csv',
    'Group_2020-01-13_DEF_Red_2019-01-30.csv',
    'Group_2020-01-24_DEF_Red_2019-01-30.csv'],
 'GHI_Green_2019-03-28': ['Group_2020-01-03_GHI_Green_2019-03-28.csv',
    'Group_2020-01-13_GHI_Green_2019-03-28.csv',
    'Group_2020-01-24_GHI_Green_2019-03-28.csv']}

When I want to remove those files from 2020-01-03 and 2020-01-13, then there is only one from 2020-01-24 at the pathway instead of one per group. My understanding is that those groups do not exist at the pathway, then os.remove just take one of them, but I cannot figure out how to make it do the same than inside in the dictionary.

for k,v in dictionary.items():
print(k)
for file in v:
    print(file)
    if os.path.join(path, file) != max(glob.glob(path + '*')):
        test.remove(file)
        # os.remove(os.path.join(path, file))

The printing of the key and values shows the groups properly assigned, and removing them happens as desirable.

ABC_Blue_2018-12-18
Group_2020-01-03_ABC_Blue_2018-12-18.csv
Group_2020-01-13_ABC_Blue_2018-12-18.csv
Group_2020-01-24_ABC_Blue_2018-12-18.csv
DEF_Red_2019-01-30
Group_2020-01-03_DEF_Red_2019-01-30.csv
Group_2020-01-13_DEF_Red_2019-01-30.csv
Group_2020-01-24_DEF_Red_2019-01-30.csv
GHI_Dekalb W_2019-03-28
Group_2020-01-03_GHI_Green_2019-03-28.csv
Group_2020-01-13_GHI_Green_2019-03-28.csv
Group_2020-01-24_GHI_Green_2019-03-28.csv

Result from LIST (desired):

['Group_2020-01-24_ABC_Blue_2018-12-18.csv',
 'Group_2020-01-24_DEF_Red_2019-01-30.csv',
 'Group_2020-01-24_GHI_Green_2019-03-28.csv']

Result from PATHWAY:

'Group_2020-01-24_GHI_Green_2019-03-28.csv'

Additionally, if I add glob.glob to refer to the pathway, files are deleted but it prompts an error looking for the file that was just deleted. Running the code again keeps deleting files and with the same error.

dictionary = {}
for file in glob.glob(path + '*'):
    
    key = os.path.basename(file).split('_',2)[-1].split('.')[0]
    group = dictionary.get(key,[])
    group.append(os.path.basename(file))  
    dictionary[key] = group
    
    for k,v in dictionary.items():
        if os.path.basename(file) != max(v):
            os.remove(file)

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\path\\Group_2020-01-03_GHI_Green_2018-12-18.csv'

答案1

得分: 0

以下是您要翻译的代码部分:

import os
from datetime import datetime

directory = "path"

# create a dictionary to store the most recent files from each group
most_recent_files = {}

for file in os.listdir(directory):

    file = str(file)[:-4]
    file_parts = file.split("_")  # split to extract date and group name
    file_date = datetime.strptime(file_parts[1], "%Y-%m-%d")  # convert to datetime object
    group_name = "_".join(file_parts[2:5])
    file = file + '.csv'

    # update dictionary with most recent file for each group
    if group_name in most_recent_files:
        if file_date > most_recent_files[group_name][0]:
            os.remove(os.path.join(directory, most_recent_files[group_name][1]))  # remove older file
            most_recent_files[group_name] = (file_date, file)  # update most recent file
        else:
            os.remove(os.path.join(directory, file))  # remove current, older, file
    else:
        most_recent_files[group_name] = (file_date, file)  # add first file to dictionary

# print recent files from each group
for group, file_info in most_recent_files.items():
    print(f"{group}: {file_info[1]}")
英文:

I managed to get the following functional code.

As desired, it removes the older files based on the date that appears first in the title of each file. In this way, if you update the pathway with newer files with the same tag, then you would remove those that are not the most recent anymore.

import os
from datetime import datetime

directory = "path"

# create a dictionary to store the most recent files from each group
most_recent_files = {}

for file in os.listdir(directory):

    file = str(file)[:-4]
    file_parts = file.split("_") # split to extract date and group name
    file_date = datetime.strptime(file_parts[1], "%Y-%m-%d")  # convert to datetime object
    group_name = "_".join(file_parts[2:5])
    file = file + '.csv'

    # update dictionary with most recent file for each group
    if group_name in most_recent_files:
        if file_date > most_recent_files[group_name][0]:
            os.remove(os.path.join(directory, most_recent_files[group_name][1]))  # remove older file
            most_recent_files[group_name] = (file_date, file)  # update most recent file
        else:
            os.remove(os.path.join(directory, file))  # remove current, older, file
    else:
        most_recent_files[group_name] = (file_date, file)  # add first file to dictionary
    
# print recent files from each group
for group, file_info in most_recent_files.items():
    print(f"{group}: {file_info[1]}")

huangapple
  • 本文由 发表于 2023年3月3日 18:18:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/75625789.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定