如何限制 os.walk 结果为单个文件?

huangapple go评论110阅读模式
英文:

How can I limit os.walk results for a single file?

问题

我正在尝试在给定的目录中搜索特定文件,如果该文件不存在,我希望代码显示"文件不存在"。目前,使用os.walk,我可以使其工作,但是这会命中每个不是指定文件的文件,并打印"文件不存在"。我知道这是os.walk的工作方式,但我不确定是否有办法使其仅在找到或未找到时打印一次。

文件夹结构:

根文件夹|
|项目文件夹
|file.xml
|其他文件/子文件夹

我希望代码的工作方式是进入"项目文件夹",对"file.xml"进行递归搜索,一旦找到,就打印一次"找到",否则打印一次"未找到"。

代码如下:

  1. def check_file(x): #x = 根文件夹路径
  2. for d in next(os.walk(x))[1]: #如果我理解正确,[1] 将是项目文件夹
  3. for root, directories, files in os.walk(x):
  4. for name in files:
  5. if "file.xml" not in name:
  6. print("找到")
  7. else:
  8. print("文件不存在")

如果我将代码更改为:

  1. for name in files:
  2. if "file.xml" in name:
  3. print("找到")
  4. else:
  5. pass

代码在技术上按预期工作,但实际上并没有太多帮助来指出文件不存在,因此这不是一个好的解决方案。如果我能够为代码提供特定的路径以进行查找,那将会更容易,但由于用户可以将"根文件夹"放在他们的计算机的任何位置,而"项目文件夹"的名称将根据项目而异,我不认为我能够为代码提供特定的位置。

是否有一种方法可以使用os.walk使其工作,或者是否有其他方法效果更好?

英文:

I am trying to search a given directory for a specific file, and if that file does not exist I would want the code to say "File does not exist". Currently with os.walk I can get this to work, however this will hit on every single file that isn't the specified file and print "File dos not exist". I know that this is how os.walk functions, but I was not sure if there is a way to make it only print out once if it is found or not found.

Folder structure:

root folder|
|Project Folder
|file.xml
|other files/subfolders

How I would want the code to work is to go inside of "Project Folder", do a recursive search for "file.xml", and once it is found print out once "Found", otherwise prints out once "Not found".

The code is:

  1. def check_file(x): #x = root folder dir
  2. for d in next(os.walk(x))[1]: #if I understand correctly, [1] will be Project Folder
  3. for root, directories, files in os.walk(x):
  4. for name in files:
  5. if "file.xml" not in name:
  6. print("found")
  7. else:
  8. print("File Missing")

If I change the code to

  1. for name in files:
  2. if "file.xml" in name:
  3. print("found")
  4. else:
  5. pass

The code technically works as intended, but it doesn't really do much to help point out if it isn't there, so this isn't a good solution. It would be easier if I was able to give the code a specific path to look in, however as the user is able to place the 'root folder' anywhere on their machine as well as the 'project folder' would have different names depending on the project, I don't think I would be able to give the code a specific location.

Is there a way to get this to work with os.walk, or would another method work best?

答案1

得分: 3

glob 模块非常方便用于基于通配符的递归搜索。特别是 ** 通配符可以匹配任意深度的目录树,因此您可以在根目录的后代中的任何位置找到文件。

例如:

  1. import glob
  2. def check_file(x): # 其中 x 是搜索的根目录
  3. files = glob.glob('**/file.xml', root_dir=x, recursive=True)
  4. if files:
  5. print(f"找到 {len(files)} 个匹配的文件")
  6. else:
  7. print("未找到匹配的文件")
英文:

The glob module is very convenient for this kind of wildcard-based recursive search. Particularly, the ** wildcard matches a directory tree of arbitrary depth, so you can find a file anywhere in the descendants of your root directory.

For example:

  1. import glob
  2. def check_file(x): # where x is the root directory for the search
  3. files = glob.glob('**/file.xml', root_dir=x, recursive=True)
  4. if files:
  5. print(f"Found {len(files)} matching files")
  6. else:
  7. print("Did not find a matching file")

答案2

得分: 2

以下是翻译好的部分:

[Python.Docs]: os.walk(top, topdown=True, onerror=None, followlinks=False)的清单。

你不需要2个嵌套的循环。您只需要在每次迭代时检查基本文件名是否存在于os.walk生成的第3个成员中。

此实现处理了文件存在于多个目录的情况。如果您只需要打印文件一次(无论它在目录中出现多少次),则有函数search_file_once

code00.py

  1. #!/usr/bin/env python
  2. import os
  3. import sys
  4. def search_file(root_dir, base_name):
  5. found = 0
  6. for root, dirs, files in os.walk(root_dir):
  7. if base_name in files:
  8. print("Found: {:s}".format(os.path.join(root, base_name)))
  9. found += 1
  10. if not found:
  11. print("Not found")
  12. # @TODO - cfati: Only care if file is found once
  13. def search_file_once(root_dir, base_name):
  14. for root, dirs, files in os.walk(root_dir):
  15. if base_name in files:
  16. print("Found: {:s}".format(os.path.join(root, base_name)))
  17. break
  18. else:
  19. print("Not found")
  20. def main(*argv):
  21. root = os.path.dirname(os.path.abspath(__file__))
  22. files = (
  23. "once.xml",
  24. "multiple.xml",
  25. "notpresent.xml",
  26. )
  27. for file in files:
  28. print("\n{:s} 中递归搜索 {:s}".format(root, file))
  29. search_file(root, file)
  30. if __name__ == "__main__":
  31. print("Python {:s} {:03d} 位于 {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
  32. 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
  33. rc = main(*sys.argv[1:])
  34. print("\n完成。\n")
  35. sys.exit(rc)

输出

  1. [cfati@CFATI-5510-0:e:\Work\Dev\StackExchange\StackOverflow\q076383189]> sopr.bat
  2. ### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###
  3. [prompt]> tree /a /f
  4. Folder PATH listing for volume SSD0-WORK
  5. Volume serial number is AE9E-72AC
  6. E:.
  7. | code00.py
  8. |
  9. \---dir0
  10. +---dir00
  11. +---dir01
  12. | multiple.xml
  13. | once.xml
  14. |
  15. \---dir02
  16. \---dir020
  17. multiple.xml
  18. [prompt]>
  19. [prompt]> "e:\Work\Dev\VEnvs\py_pc064_03.10_test0\Scripts\python.exe" ./code00.py
  20. Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] 064 位于 win32
  21. e:\Work\Dev\StackExchange\StackOverflow\q076383189 中递归搜索 once.xml
  22. Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir01\once.xml
  23. e:\Work\Dev\StackExchange\StackOverflow\q076383189 中递归搜索 multiple.xml
  24. Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir01\multiple.xml
  25. Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir02\dir020\multiple.xml
  26. e:\Work\Dev\StackExchange\StackOverflow\q076383189 中递归搜索 notpresent.xml
  27. Not found
  28. 完成。
英文:

Listing [Python.Docs]: os.walk(top, topdown=True, onerror=None, followlinks=False).

You don't need 2 nested loops. You only need to check on each iteration, if the base file name is present in the 3<sup>rd</sup> member that os.walk produces.<br>
This implementation handles the case of a file being present in multiple directories. If you only need print the file once (no matter how many times it's present in the directory), there's the function search_file_once.

code00.py:

  1. #!/usr/bin/env python
  2. import os
  3. import sys
  4. def search_file(root_dir, base_name):
  5. found = 0
  6. for root, dirs, files in os.walk(root_dir):
  7. if base_name in files:
  8. print(&quot;Found: {:s}&quot;.format(os.path.join(root, base_name)))
  9. found += 1
  10. if not found:
  11. print(&quot;Not found&quot;)
  12. # @TODO - cfati: Only care if file is found once
  13. def search_file_once(root_dir, base_name):
  14. for root, dirs, files in os.walk(root_dir):
  15. if base_name in files:
  16. print(&quot;Found: {:s}&quot;.format(os.path.join(root, base_name)))
  17. break
  18. else:
  19. print(&quot;Not found&quot;)
  20. def main(*argv):
  21. root = os.path.dirname(os.path.abspath(__file__))
  22. files = (
  23. &quot;once.xml&quot;,
  24. &quot;multiple.xml&quot;,
  25. &quot;notpresent.xml&quot;,
  26. )
  27. for file in files:
  28. print(&quot;\nSearching recursively for {:s} in {:s}&quot;.format(file, root))
  29. search_file(root, file)
  30. if __name__ == &quot;__main__&quot;:
  31. print(&quot;Python {:s} {:03d}bit on {:s}\n&quot;.format(&quot; &quot;.join(elem.strip() for elem in sys.version.split(&quot;\n&quot;)),
  32. 64 if sys.maxsize &gt; 0x100000000 else 32, sys.platform))
  33. rc = main(*sys.argv[1:])
  34. print(&quot;\nDone.\n&quot;)
  35. sys.exit(rc)

Output:

>
&gt; [cfati@CFATI-5510-0:e:\Work\Dev\StackExchange\StackOverflow\q076383189]&gt; sopr.bat
&gt; ### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###
&gt;
&gt; [prompt]&gt; tree /a /f
&gt; Folder PATH listing for volume SSD0-WORK
&gt; Volume serial number is AE9E-72AC
&gt; E:.
&gt; | code00.py
&gt; |
&gt; \---dir0
&gt; +---dir00
&gt; +---dir01
&gt; | multiple.xml
&gt; | once.xml
&gt; |
&gt; \---dir02
&gt; \---dir020
&gt; multiple.xml
&gt;
&gt;
&gt; [prompt]&gt;
&gt; [prompt]&gt; &quot;e:\Work\Dev\VEnvs\py_pc064_03.10_test0\Scripts\python.exe&quot; ./code00.py
&gt; Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] 064bit on win32
&gt;
&gt;
&gt; Searching recursively for once.xml in e:\Work\Dev\StackExchange\StackOverflow\q076383189
&gt; Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir01\once.xml
&gt;
&gt; Searching recursively for multiple.xml in e:\Work\Dev\StackExchange\StackOverflow\q076383189
&gt; Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir01\multiple.xml
&gt; Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir02\dir020\multiple.xml
&gt;
&gt; Searching recursively for notpresent.xml in e:\Work\Dev\StackExchange\StackOverflow\q076383189
&gt; Not found
&gt;
&gt; Done.
&gt;

This is just one of the multiple ways possible of doing this. Check [SO]: How do I list all files of a directory? (@CristiFati's answer) for more details.

答案3

得分: 1

以下是翻译好的部分:

  1. 我以前写过这样的函数以及其他一些函数想要提供它们以供参考其中一些可能需要最小或没有修改就可以用于您的情况
  2. ## 查找所有匹配项(不仅仅是一个):
  3. ## 示例用法:findAll('*.txt','/path/to/dir')
  4. def findAll(name, path):
  5. result = []
  6. for root, dirs, files in os.walk(path):
  7. if name in files:
  8. result.append(os.path.join(root, name))
  9. return result
  10. ## 一个持续查找,直到找到所有目标文件的函数)
  11. def findProjectFiles(Folder, targetFiles):
  12. import os
  13. os.chdir(Folder)
  14. filesFound = []
  15. while len(targetFiles) > len(filesFound):
  16. for root, dirs, files in os.walk(Folder):
  17. for f in files:
  18. current = os.path.join(Folder, f)
  19. if f in TargetFiles:
  20. filesFound.append(f)
  21. for d in dirs:
  22. Folder = os.path.join(Folder, d)
  23. break;
  24. filePaths = os.path.abspath(filePaths)
  25. return filePaths
  26. # 在文件夹中查找所有文件路径:
  27. def findPaths(name, path):
  28. import os
  29. for root, dirs, files in os.walk(path):
  30. if name in files:
  31. return os.path.join(root, name)
  32. ## 可以轻松搜索返回的对象以查找您想要找到的字符串
  33. ## 类似,但这将匹配模式(即不必是精确的文件名匹配)。
  34. import os, fnmatch
  35. def findMatch(pattern, path):
  36. result = []
  37. for root, dirs, files in os.walk(path):
  38. for name in files:
  39. if fnmatch.fnmatch(name, pattern):
  40. result.append(os.path.join(root, name))
  41. return result
英文:

I have written a function like this and several others in the past. Want to provide them all for context, some will work for your case with minimal to no modifcation.

  1. ## Find ALL matches (not just one):
  2. ## Example Usage: findAll(&#39;*.txt&#39;, &#39;/path/to/dir&#39;)
  3. def findAll(name, path):
  4. result = []
  5. for root, dirs, files in os.walk(path):
  6. if name in files:
  7. result.append(os.path.join(root, name))
  8. return result
  9. ## A function that keeps going until all target files are found)
  10. def findProjectFiles(Folder, targetFiles):
  11. import os
  12. os.chdir(Folder)
  13. filesFound=[]
  14. while len(targetFiles) &gt; len(filesFound):
  15. for root, dirs, files in os.walk(Folder):
  16. for f in files:
  17. current=os.path.join(Folder, f)
  18. if f in TargetFiles:
  19. filesFound.append(f)
  20. for d in dirs:
  21. Folder=os.path.join(Folder, d)
  22. break;
  23. filePaths=os.path.abspath(filePaths)
  24. return filePaths
  25. # find all file paths in folder:
  26. def findPaths(name, path):
  27. import os
  28. for root, dirs, files in os.walk(path):
  29. if name in files:
  30. return os.path.join(root, name)
  31. ## can search the object returned for the string you want to find easily
  32. ## Similar, but this will match a pattern (i.e. does not have to be exact file name match).
  33. import os, fnmatch
  34. def findMatch(pattern, path):
  35. result = []
  36. for root, dirs, files in os.walk(path):
  37. for name in files:
  38. if fnmatch.fnmatch(name, pattern):
  39. result.append(os.path.join(root, name))
  40. return result

huangapple
  • 本文由 发表于 2023年6月1日 22:53:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76383189.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定