2023年6月5日 21:47:38go评论105阅读模式

英文:

Find similar time in filename

问题

我有一个Excel文件，其中提供了特定的时间，例如17:40、18:15、10:11，对应于特定的日期。我还有一个文件夹，其中存储了多个文件，其中一些文件的名称中包含了与时间相关的内容，例如：

XXXXXMCCAAS_17_43_22_Timecheck.csv

所以通过查看文件夹，我需要找到与17:40类似的文件，例如17_43_22。有没有自动打印与我特定“模式”匹配的文件的方法？

我的第一次尝试将是使用正则表达式来筛选文件，然后假设有一个10-15分钟的时间窗口。

但是否有更好的方法来完成这个任务？

英文:

I have excel file where specific times are provided e.g 17:40, 18:15, 10:11 for specific dates
and i have Folder where multiplie files are stored ad some of them have "similar time" in name
e.g

XXXXXMCCAAS_17_43_22_Timecheck.csv

So by looking into folder i have to find file out of multiple ones which has something similar to 17:40 in name so it would be 17_43_22

Is there any way to automatically print files that matches my specific "pattern"?

My first shot would be to go for files using regex expression and assuming lets say 10-15 minute window

But is there a better way to do this?

答案1

得分: 2

以下是用Python编写的脚本，使用了Python的内置库来完成此任务。该脚本使用了os、re、pandas和datetime库。该脚本读取Excel文件中的时间，对其进行格式化，然后循环遍历指定目录中的文件，在定义的时间窗口内查找匹配项。

import os
import re
import pandas as pd
from datetime import datetime, timedelta
# 读取Excel文件
df = pd.read_excel('times.xlsx')  # 根据您的文件路径和名称进行调整
# 将时间转换为预期格式并创建一个列表
times = df['Time'].dt.strftime('%H_%M_%S').tolist()  # 根据您的列名调整 'Time'
# 定义一个函数来检查文件名的时间是否在时间窗口内
def within_window(filename_time, excel_time, window=15):
    FMT = '%H_%M_%S'
    excel_time_obj = datetime.strptime(excel_time, FMT)
    
    # 如果缺少秒数，则添加秒数
    if excel_time_obj.second == 0:
        excel_time_obj = excel_time_obj + timedelta(seconds=0)
        
    tdelta = datetime.strptime(filename_time, FMT) - excel_time_obj
    if abs(tdelta.total_seconds()) <= window * 60:  # 以秒为单位进行比较
        return True
    return False
# 循环遍历目录中的文件
for filename in os.listdir('/path/to/your/files'):  # 根据您的目录路径进行调整
    # 使用正则表达式从文件名中提取时间
    match = re.search(r'\d{2}_\d{2}_\d{2}', filename)
    if match:
        filename_time = match.group()
        # 检查文件名的时间是否在任何Excel时间的窗口内
        for time in times:
            if within_window(filename_time, time):
                print(filename)
                break  # 如果找到匹配项，无需检查其他时间

请确保根据您的设置调整Excel文件路径、目录路径和列名。

英文:

You can use Python's built-in libraries to accomplish this task. Below is a Python script using the os, re, pandas and datetime libraries. This script reads the times from the Excel file, formats them, and then loops through the files in the specified directory, looking for matches within a defined time window.

import os
import re
import pandas as pd
from datetime import datetime, timedelta
# Read the Excel file
df = pd.read_excel(&#39;times.xlsx&#39;)  # adjust this to your file path and name
# Convert the times to the expected format and create a list
times = df[&#39;Time&#39;].dt.strftime(&#39;%H_%M_%S&#39;).tolist()  # adjust &#39;Time&#39; to your column name
# Define a function to check if a filename time is within a time window
def within_window(filename_time, excel_time, window=15):
    FMT = &#39;%H_%M_%S&#39;
    excel_time_obj = datetime.strptime(excel_time, FMT)
    
    # Add seconds if missing
    if excel_time_obj.second == 0:
        excel_time_obj = excel_time_obj + timedelta(seconds=0)
        
    tdelta = datetime.strptime(filename_time, FMT) - excel_time_obj
    if abs(tdelta.total_seconds()) &lt;= window * 60:  # compare in seconds
        return True
    return False
# Loop through the files in the directory
for filename in os.listdir(&#39;/path/to/your/files&#39;):  # adjust this to your directory path
    # Use regex to extract the time from the filename
    match = re.search(r&#39;\d{2}_\d{2}_\d{2}&#39;, filename)
    if match:
        filename_time = match.group()
        # Check if the filename time is within the window for any excel times
        for time in times:
            if within_window(filename_time, time):
                print(filename)
                break  # if a match is found, no need to check the other times

Make sure to adjust the Excel file path, the directory path, and the column name according to your setup.

答案2

得分: 1

以下是您要求的代码部分的翻译：

对于您问题的第一部分，我会这样做：
从 pathlib 导入 Path
import pandas as pd
times = (
    pd.read_excel("tmp/file.xlsx", usecols="A", dtype="str")
    .squeeze().str[:4].replace(":", "_", regex=True).tolist()
) #[&#39;17_4&#39;, &#39;18_1&#39;, &#39;10_1&#39;]
matches = []
for t in times:
    matches.extend(Path("tmp").glob(f"*_{t}*"))
输出：
for m in matches:
    print(m)
tmp\XXXXXMCCAAS_17_43_22_Timecheck.csv
tmp\XXXXXMCCAAS_10_10_34_Timecheck.csv
tmp\XXXXXMCCAAS_10_11_23_Timecheck.csv
使用的电子表格和目录树：
tmp/
┣━━ file.xlsx # &lt;-- 不需要在相同位置
┣━━ XXXXXMCCAAS_00_03_45_Timecheck.csv
┣━━ XXXXXMCCAAS_00_25_45_Timecheck.csv
┣━━ XXXXXMCCAAS_00_45_43_Timecheck.csv
┣━━ XXXXXMCCAAS_01_16_35_Timecheck.csv
┣━━ XXXXXMCCAAS_03_01_51_Timecheck.csv
┣━━ XXXXXMCCAAS_03_04_12_Timecheck.csv
┣━━ XXXXXMCCAAS_04_47_22_Timecheck.csv
┣━━ XXXXXMCCAAS_09_27_07_Timecheck.csv
┣━━ XXXXXMCCAAS_10_10_34_Timecheck.csv
┣━━ XXXXXMCCAAS_10_11_23_Timecheck.csv
┣━━ XXXXXMCCAAS_13_28_15_Timecheck.csv
┣━━ XXXXXMCCAAS_14_09_30_Timecheck.csv
┣━━ XXXXXMCCAAS_14_59_41_Timecheck.csv
┣━━ XXXXXMCCAAS_16_33_31_Timecheck.csv
┣━━ XXXXXMCCAAS_17_24_02_Timecheck.csv
┣━━ XXXXXMCCAAS_17_43_22_Timecheck.csv
┣━━ XXXXXMCCAAS_18_24_16_Timecheck.csv
┣━━ XXXXXMCCAAS_19_19_42_Timecheck.csv
┣━━ XXXXXMCCAAS_20_42_18_Timecheck.csv
┗━━ XXXXXMCCAAS_21_21_25_Timecheck.csv

请注意，代码中的特殊字符如 "<" 和 "&" 已经被正确翻译。

英文:

For the first part of your question, I would do it this way :

from pathlib import Path
import pandas as pd
times = (
    pd.read_excel(&quot;tmp/file.xlsx&quot;, usecols=&quot;A&quot;, dtype=&quot;str&quot;)
    .squeeze().str[:4].replace(&quot;:&quot;, &quot;_&quot;, regex=True).tolist()
) #[&#39;17_4&#39;, &#39;18_1&#39;, &#39;10_1&#39;]
matches = []
for t in times:
    matches.extend(Path(&quot;tmp&quot;).glob(f&quot;*_{t}*&quot;))

Output :

for m in matches:
    print(m)
tmp\XXXXXMCCAAS_17_43_22_Timecheck.csv
tmp\XXXXXMCCAAS_10_10_34_Timecheck.csv
tmp\XXXXXMCCAAS_10_11_23_Timecheck.csv

Spreadsheet & Tree used :

tmp/
┣━━ file.xlsx # &lt;-- doesn&#39;t need to be in the same location
┣━━ XXXXXMCCAAS_00_03_45_Timecheck.csv
┣━━ XXXXXMCCAAS_00_25_45_Timecheck.csv
┣━━ XXXXXMCCAAS_00_45_43_Timecheck.csv
┣━━ XXXXXMCCAAS_01_16_35_Timecheck.csv
┣━━ XXXXXMCCAAS_03_01_51_Timecheck.csv
┣━━ XXXXXMCCAAS_03_04_12_Timecheck.csv
┣━━ XXXXXMCCAAS_04_47_22_Timecheck.csv
┣━━ XXXXXMCCAAS_09_27_07_Timecheck.csv
┣━━ XXXXXMCCAAS_10_10_34_Timecheck.csv
┣━━ XXXXXMCCAAS_10_11_23_Timecheck.csv
┣━━ XXXXXMCCAAS_13_28_15_Timecheck.csv
┣━━ XXXXXMCCAAS_14_09_30_Timecheck.csv
┣━━ XXXXXMCCAAS_14_59_41_Timecheck.csv
┣━━ XXXXXMCCAAS_16_33_31_Timecheck.csv
┣━━ XXXXXMCCAAS_17_24_02_Timecheck.csv
┣━━ XXXXXMCCAAS_17_43_22_Timecheck.csv
┣━━ XXXXXMCCAAS_18_24_16_Timecheck.csv
┣━━ XXXXXMCCAAS_19_19_42_Timecheck.csv
┣━━ XXXXXMCCAAS_20_42_18_Timecheck.csv
┗━━ XXXXXMCCAAS_21_21_25_Timecheck.csv

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在文件名中找到相似的时间

问题

答案1

答案2

“Cast doesn’t work as expected when concatenating strings.”

将一个tkinter画布项目放在位于同一画布上的其他tkinter小部件的顶部？

使用Python键盘模块创建的单词监听器尽管未输入所需的单词仍然被触发。

使用pytest中的parametrize来处理多个参数

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。