在文件名中找到相似的时间

huangapple go评论105阅读模式
英文:

Find similar time in filename

问题

我有一个Excel文件,其中提供了特定的时间,例如17:40、18:15、10:11,对应于特定的日期。我还有一个文件夹,其中存储了多个文件,其中一些文件的名称中包含了与时间相关的内容,例如:

XXXXXMCCAAS_17_43_22_Timecheck.csv

所以通过查看文件夹,我需要找到与17:40类似的文件,例如17_43_22。有没有自动打印与我特定“模式”匹配的文件的方法?

我的第一次尝试将是使用正则表达式来筛选文件,然后假设有一个10-15分钟的时间窗口。

但是否有更好的方法来完成这个任务?

英文:

I have excel file where specific times are provided e.g 17:40, 18:15, 10:11 for specific dates
and i have Folder where multiplie files are stored ad some of them have "similar time" in name
e.g

XXXXXMCCAAS_17_43_22_Timecheck.csv

So by looking into folder i have to find file out of multiple ones which has something similar to 17:40 in name so it would be 17_43_22

Is there any way to automatically print files that matches my specific "pattern"?

My first shot would be to go for files using regex expression and assuming lets say 10-15 minute window

But is there a better way to do this?

答案1

得分: 2

以下是用Python编写的脚本,使用了Python的内置库来完成此任务。该脚本使用了os、re、pandas和datetime库。该脚本读取Excel文件中的时间,对其进行格式化,然后循环遍历指定目录中的文件,在定义的时间窗口内查找匹配项。

  1. import os
  2. import re
  3. import pandas as pd
  4. from datetime import datetime, timedelta
  5. # 读取Excel文件
  6. df = pd.read_excel('times.xlsx') # 根据您的文件路径和名称进行调整
  7. # 将时间转换为预期格式并创建一个列表
  8. times = df['Time'].dt.strftime('%H_%M_%S').tolist() # 根据您的列名调整 'Time'
  9. # 定义一个函数来检查文件名的时间是否在时间窗口内
  10. def within_window(filename_time, excel_time, window=15):
  11. FMT = '%H_%M_%S'
  12. excel_time_obj = datetime.strptime(excel_time, FMT)
  13. # 如果缺少秒数,则添加秒数
  14. if excel_time_obj.second == 0:
  15. excel_time_obj = excel_time_obj + timedelta(seconds=0)
  16. tdelta = datetime.strptime(filename_time, FMT) - excel_time_obj
  17. if abs(tdelta.total_seconds()) <= window * 60: # 以秒为单位进行比较
  18. return True
  19. return False
  20. # 循环遍历目录中的文件
  21. for filename in os.listdir('/path/to/your/files'): # 根据您的目录路径进行调整
  22. # 使用正则表达式从文件名中提取时间
  23. match = re.search(r'\d{2}_\d{2}_\d{2}', filename)
  24. if match:
  25. filename_time = match.group()
  26. # 检查文件名的时间是否在任何Excel时间的窗口内
  27. for time in times:
  28. if within_window(filename_time, time):
  29. print(filename)
  30. break # 如果找到匹配项,无需检查其他时间

请确保根据您的设置调整Excel文件路径、目录路径和列名。

英文:

You can use Python's built-in libraries to accomplish this task. Below is a Python script using the os, re, pandas and datetime libraries. This script reads the times from the Excel file, formats them, and then loops through the files in the specified directory, looking for matches within a defined time window.

  1. import os
  2. import re
  3. import pandas as pd
  4. from datetime import datetime, timedelta
  5. # Read the Excel file
  6. df = pd.read_excel(&#39;times.xlsx&#39;) # adjust this to your file path and name
  7. # Convert the times to the expected format and create a list
  8. times = df[&#39;Time&#39;].dt.strftime(&#39;%H_%M_%S&#39;).tolist() # adjust &#39;Time&#39; to your column name
  9. # Define a function to check if a filename time is within a time window
  10. def within_window(filename_time, excel_time, window=15):
  11. FMT = &#39;%H_%M_%S&#39;
  12. excel_time_obj = datetime.strptime(excel_time, FMT)
  13. # Add seconds if missing
  14. if excel_time_obj.second == 0:
  15. excel_time_obj = excel_time_obj + timedelta(seconds=0)
  16. tdelta = datetime.strptime(filename_time, FMT) - excel_time_obj
  17. if abs(tdelta.total_seconds()) &lt;= window * 60: # compare in seconds
  18. return True
  19. return False
  20. # Loop through the files in the directory
  21. for filename in os.listdir(&#39;/path/to/your/files&#39;): # adjust this to your directory path
  22. # Use regex to extract the time from the filename
  23. match = re.search(r&#39;\d{2}_\d{2}_\d{2}&#39;, filename)
  24. if match:
  25. filename_time = match.group()
  26. # Check if the filename time is within the window for any excel times
  27. for time in times:
  28. if within_window(filename_time, time):
  29. print(filename)
  30. break # if a match is found, no need to check the other times

Make sure to adjust the Excel file path, the directory path, and the column name according to your setup.

答案2

得分: 1

以下是您要求的代码部分的翻译:

  1. 对于您问题的第一部分我会这样做
  2. pathlib 导入 Path
  3. import pandas as pd
  4. times = (
  5. pd.read_excel("tmp/file.xlsx", usecols="A", dtype="str")
  6. .squeeze().str[:4].replace(":", "_", regex=True).tolist()
  7. ) #[&#39;17_4&#39;, &#39;18_1&#39;, &#39;10_1&#39;]
  8. matches = []
  9. for t in times:
  10. matches.extend(Path("tmp").glob(f"*_{t}*"))
  11. 输出
  12. for m in matches:
  13. print(m)
  14. tmp\XXXXXMCCAAS_17_43_22_Timecheck.csv
  15. tmp\XXXXXMCCAAS_10_10_34_Timecheck.csv
  16. tmp\XXXXXMCCAAS_10_11_23_Timecheck.csv
  17. 使用的电子表格和目录树
  18. tmp/
  19. ┣━━ file.xlsx # &lt;-- 不需要在相同位置
  20. ┣━━ XXXXXMCCAAS_00_03_45_Timecheck.csv
  21. ┣━━ XXXXXMCCAAS_00_25_45_Timecheck.csv
  22. ┣━━ XXXXXMCCAAS_00_45_43_Timecheck.csv
  23. ┣━━ XXXXXMCCAAS_01_16_35_Timecheck.csv
  24. ┣━━ XXXXXMCCAAS_03_01_51_Timecheck.csv
  25. ┣━━ XXXXXMCCAAS_03_04_12_Timecheck.csv
  26. ┣━━ XXXXXMCCAAS_04_47_22_Timecheck.csv
  27. ┣━━ XXXXXMCCAAS_09_27_07_Timecheck.csv
  28. ┣━━ XXXXXMCCAAS_10_10_34_Timecheck.csv
  29. ┣━━ XXXXXMCCAAS_10_11_23_Timecheck.csv
  30. ┣━━ XXXXXMCCAAS_13_28_15_Timecheck.csv
  31. ┣━━ XXXXXMCCAAS_14_09_30_Timecheck.csv
  32. ┣━━ XXXXXMCCAAS_14_59_41_Timecheck.csv
  33. ┣━━ XXXXXMCCAAS_16_33_31_Timecheck.csv
  34. ┣━━ XXXXXMCCAAS_17_24_02_Timecheck.csv
  35. ┣━━ XXXXXMCCAAS_17_43_22_Timecheck.csv
  36. ┣━━ XXXXXMCCAAS_18_24_16_Timecheck.csv
  37. ┣━━ XXXXXMCCAAS_19_19_42_Timecheck.csv
  38. ┣━━ XXXXXMCCAAS_20_42_18_Timecheck.csv
  39. ┗━━ XXXXXMCCAAS_21_21_25_Timecheck.csv

请注意,代码中的特殊字符如 "<" 和 "&" 已经被正确翻译。

英文:

For the first part of your question, I would do it this way :

  1. from pathlib import Path
  2. import pandas as pd
  3. times = (
  4. pd.read_excel(&quot;tmp/file.xlsx&quot;, usecols=&quot;A&quot;, dtype=&quot;str&quot;)
  5. .squeeze().str[:4].replace(&quot;:&quot;, &quot;_&quot;, regex=True).tolist()
  6. ) #[&#39;17_4&#39;, &#39;18_1&#39;, &#39;10_1&#39;]
  7. matches = []
  8. for t in times:
  9. matches.extend(Path(&quot;tmp&quot;).glob(f&quot;*_{t}*&quot;))

Output :

  1. for m in matches:
  2. print(m)
  3. tmp\XXXXXMCCAAS_17_43_22_Timecheck.csv
  4. tmp\XXXXXMCCAAS_10_10_34_Timecheck.csv
  5. tmp\XXXXXMCCAAS_10_11_23_Timecheck.csv

Spreadsheet & Tree used :

  1. tmp/
  2. ┣━━ file.xlsx # &lt;-- doesn&#39;t need to be in the same location
  3. ┣━━ XXXXXMCCAAS_00_03_45_Timecheck.csv
  4. ┣━━ XXXXXMCCAAS_00_25_45_Timecheck.csv
  5. ┣━━ XXXXXMCCAAS_00_45_43_Timecheck.csv
  6. ┣━━ XXXXXMCCAAS_01_16_35_Timecheck.csv
  7. ┣━━ XXXXXMCCAAS_03_01_51_Timecheck.csv
  8. ┣━━ XXXXXMCCAAS_03_04_12_Timecheck.csv
  9. ┣━━ XXXXXMCCAAS_04_47_22_Timecheck.csv
  10. ┣━━ XXXXXMCCAAS_09_27_07_Timecheck.csv
  11. ┣━━ XXXXXMCCAAS_10_10_34_Timecheck.csv
  12. ┣━━ XXXXXMCCAAS_10_11_23_Timecheck.csv
  13. ┣━━ XXXXXMCCAAS_13_28_15_Timecheck.csv
  14. ┣━━ XXXXXMCCAAS_14_09_30_Timecheck.csv
  15. ┣━━ XXXXXMCCAAS_14_59_41_Timecheck.csv
  16. ┣━━ XXXXXMCCAAS_16_33_31_Timecheck.csv
  17. ┣━━ XXXXXMCCAAS_17_24_02_Timecheck.csv
  18. ┣━━ XXXXXMCCAAS_17_43_22_Timecheck.csv
  19. ┣━━ XXXXXMCCAAS_18_24_16_Timecheck.csv
  20. ┣━━ XXXXXMCCAAS_19_19_42_Timecheck.csv
  21. ┣━━ XXXXXMCCAAS_20_42_18_Timecheck.csv
  22. ┗━━ XXXXXMCCAAS_21_21_25_Timecheck.csv

在文件名中找到相似的时间

huangapple
  • 本文由 发表于 2023年6月5日 21:47:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76407079.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定