在文件名中找到相似的时间

huangapple go评论65阅读模式
英文:

Find similar time in filename

问题

我有一个Excel文件,其中提供了特定的时间,例如17:40、18:15、10:11,对应于特定的日期。我还有一个文件夹,其中存储了多个文件,其中一些文件的名称中包含了与时间相关的内容,例如:

XXXXXMCCAAS_17_43_22_Timecheck.csv

所以通过查看文件夹,我需要找到与17:40类似的文件,例如17_43_22。有没有自动打印与我特定“模式”匹配的文件的方法?

我的第一次尝试将是使用正则表达式来筛选文件,然后假设有一个10-15分钟的时间窗口。

但是否有更好的方法来完成这个任务?

英文:

I have excel file where specific times are provided e.g 17:40, 18:15, 10:11 for specific dates
and i have Folder where multiplie files are stored ad some of them have "similar time" in name
e.g

XXXXXMCCAAS_17_43_22_Timecheck.csv

So by looking into folder i have to find file out of multiple ones which has something similar to 17:40 in name so it would be 17_43_22

Is there any way to automatically print files that matches my specific "pattern"?

My first shot would be to go for files using regex expression and assuming lets say 10-15 minute window

But is there a better way to do this?

答案1

得分: 2

以下是用Python编写的脚本,使用了Python的内置库来完成此任务。该脚本使用了os、re、pandas和datetime库。该脚本读取Excel文件中的时间,对其进行格式化,然后循环遍历指定目录中的文件,在定义的时间窗口内查找匹配项。

import os
import re
import pandas as pd
from datetime import datetime, timedelta

# 读取Excel文件
df = pd.read_excel('times.xlsx')  # 根据您的文件路径和名称进行调整

# 将时间转换为预期格式并创建一个列表
times = df['Time'].dt.strftime('%H_%M_%S').tolist()  # 根据您的列名调整 'Time'

# 定义一个函数来检查文件名的时间是否在时间窗口内
def within_window(filename_time, excel_time, window=15):
    FMT = '%H_%M_%S'
    excel_time_obj = datetime.strptime(excel_time, FMT)
    
    # 如果缺少秒数,则添加秒数
    if excel_time_obj.second == 0:
        excel_time_obj = excel_time_obj + timedelta(seconds=0)
        
    tdelta = datetime.strptime(filename_time, FMT) - excel_time_obj
    if abs(tdelta.total_seconds()) <= window * 60:  # 以秒为单位进行比较
        return True
    return False

# 循环遍历目录中的文件
for filename in os.listdir('/path/to/your/files'):  # 根据您的目录路径进行调整
    # 使用正则表达式从文件名中提取时间
    match = re.search(r'\d{2}_\d{2}_\d{2}', filename)
    if match:
        filename_time = match.group()
        # 检查文件名的时间是否在任何Excel时间的窗口内
        for time in times:
            if within_window(filename_time, time):
                print(filename)
                break  # 如果找到匹配项,无需检查其他时间

请确保根据您的设置调整Excel文件路径、目录路径和列名。

英文:

You can use Python's built-in libraries to accomplish this task. Below is a Python script using the os, re, pandas and datetime libraries. This script reads the times from the Excel file, formats them, and then loops through the files in the specified directory, looking for matches within a defined time window.

import os
import re
import pandas as pd
from datetime import datetime, timedelta

# Read the Excel file
df = pd.read_excel(&#39;times.xlsx&#39;)  # adjust this to your file path and name

# Convert the times to the expected format and create a list
times = df[&#39;Time&#39;].dt.strftime(&#39;%H_%M_%S&#39;).tolist()  # adjust &#39;Time&#39; to your column name

# Define a function to check if a filename time is within a time window
def within_window(filename_time, excel_time, window=15):
    FMT = &#39;%H_%M_%S&#39;
    excel_time_obj = datetime.strptime(excel_time, FMT)
    
    # Add seconds if missing
    if excel_time_obj.second == 0:
        excel_time_obj = excel_time_obj + timedelta(seconds=0)
        
    tdelta = datetime.strptime(filename_time, FMT) - excel_time_obj
    if abs(tdelta.total_seconds()) &lt;= window * 60:  # compare in seconds
        return True
    return False

# Loop through the files in the directory
for filename in os.listdir(&#39;/path/to/your/files&#39;):  # adjust this to your directory path
    # Use regex to extract the time from the filename
    match = re.search(r&#39;\d{2}_\d{2}_\d{2}&#39;, filename)
    if match:
        filename_time = match.group()
        # Check if the filename time is within the window for any excel times
        for time in times:
            if within_window(filename_time, time):
                print(filename)
                break  # if a match is found, no need to check the other times

Make sure to adjust the Excel file path, the directory path, and the column name according to your setup.

答案2

得分: 1

以下是您要求的代码部分的翻译:

对于您问题的第一部分我会这样做

从 pathlib 导入 Path
import pandas as pd

times = (
    pd.read_excel("tmp/file.xlsx", usecols="A", dtype="str")
    .squeeze().str[:4].replace(":", "_", regex=True).tolist()
) #[&#39;17_4&#39;, &#39;18_1&#39;, &#39;10_1&#39;]

matches = []
for t in times:
    matches.extend(Path("tmp").glob(f"*_{t}*"))

输出

for m in matches:
    print(m)

tmp\XXXXXMCCAAS_17_43_22_Timecheck.csv
tmp\XXXXXMCCAAS_10_10_34_Timecheck.csv
tmp\XXXXXMCCAAS_10_11_23_Timecheck.csv

使用的电子表格和目录树

tmp/
┣━━ file.xlsx # &lt;-- 不需要在相同位置
┣━━ XXXXXMCCAAS_00_03_45_Timecheck.csv
┣━━ XXXXXMCCAAS_00_25_45_Timecheck.csv
┣━━ XXXXXMCCAAS_00_45_43_Timecheck.csv
┣━━ XXXXXMCCAAS_01_16_35_Timecheck.csv
┣━━ XXXXXMCCAAS_03_01_51_Timecheck.csv
┣━━ XXXXXMCCAAS_03_04_12_Timecheck.csv
┣━━ XXXXXMCCAAS_04_47_22_Timecheck.csv
┣━━ XXXXXMCCAAS_09_27_07_Timecheck.csv
┣━━ XXXXXMCCAAS_10_10_34_Timecheck.csv
┣━━ XXXXXMCCAAS_10_11_23_Timecheck.csv
┣━━ XXXXXMCCAAS_13_28_15_Timecheck.csv
┣━━ XXXXXMCCAAS_14_09_30_Timecheck.csv
┣━━ XXXXXMCCAAS_14_59_41_Timecheck.csv
┣━━ XXXXXMCCAAS_16_33_31_Timecheck.csv
┣━━ XXXXXMCCAAS_17_24_02_Timecheck.csv
┣━━ XXXXXMCCAAS_17_43_22_Timecheck.csv
┣━━ XXXXXMCCAAS_18_24_16_Timecheck.csv
┣━━ XXXXXMCCAAS_19_19_42_Timecheck.csv
┣━━ XXXXXMCCAAS_20_42_18_Timecheck.csv
┗━━ XXXXXMCCAAS_21_21_25_Timecheck.csv

请注意,代码中的特殊字符如 "<" 和 "&" 已经被正确翻译。

英文:

For the first part of your question, I would do it this way :

from pathlib import Path
import pandas as pd

times = (
    pd.read_excel(&quot;tmp/file.xlsx&quot;, usecols=&quot;A&quot;, dtype=&quot;str&quot;)
    .squeeze().str[:4].replace(&quot;:&quot;, &quot;_&quot;, regex=True).tolist()
) #[&#39;17_4&#39;, &#39;18_1&#39;, &#39;10_1&#39;]

matches = []
for t in times:
    matches.extend(Path(&quot;tmp&quot;).glob(f&quot;*_{t}*&quot;))

Output :

for m in matches:
    print(m)

tmp\XXXXXMCCAAS_17_43_22_Timecheck.csv
tmp\XXXXXMCCAAS_10_10_34_Timecheck.csv
tmp\XXXXXMCCAAS_10_11_23_Timecheck.csv

Spreadsheet & Tree used :

tmp/
┣━━ file.xlsx # &lt;-- doesn&#39;t need to be in the same location
┣━━ XXXXXMCCAAS_00_03_45_Timecheck.csv
┣━━ XXXXXMCCAAS_00_25_45_Timecheck.csv
┣━━ XXXXXMCCAAS_00_45_43_Timecheck.csv
┣━━ XXXXXMCCAAS_01_16_35_Timecheck.csv
┣━━ XXXXXMCCAAS_03_01_51_Timecheck.csv
┣━━ XXXXXMCCAAS_03_04_12_Timecheck.csv
┣━━ XXXXXMCCAAS_04_47_22_Timecheck.csv
┣━━ XXXXXMCCAAS_09_27_07_Timecheck.csv
┣━━ XXXXXMCCAAS_10_10_34_Timecheck.csv
┣━━ XXXXXMCCAAS_10_11_23_Timecheck.csv
┣━━ XXXXXMCCAAS_13_28_15_Timecheck.csv
┣━━ XXXXXMCCAAS_14_09_30_Timecheck.csv
┣━━ XXXXXMCCAAS_14_59_41_Timecheck.csv
┣━━ XXXXXMCCAAS_16_33_31_Timecheck.csv
┣━━ XXXXXMCCAAS_17_24_02_Timecheck.csv
┣━━ XXXXXMCCAAS_17_43_22_Timecheck.csv
┣━━ XXXXXMCCAAS_18_24_16_Timecheck.csv
┣━━ XXXXXMCCAAS_19_19_42_Timecheck.csv
┣━━ XXXXXMCCAAS_20_42_18_Timecheck.csv
┗━━ XXXXXMCCAAS_21_21_25_Timecheck.csv

在文件名中找到相似的时间

huangapple
  • 本文由 发表于 2023年6月5日 21:47:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76407079.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定