如何限制 os.walk 结果为单个文件?

huangapple go评论70阅读模式
英文:

How can I limit os.walk results for a single file?

问题

我正在尝试在给定的目录中搜索特定文件,如果该文件不存在,我希望代码显示"文件不存在"。目前,使用os.walk,我可以使其工作,但是这会命中每个不是指定文件的文件,并打印"文件不存在"。我知道这是os.walk的工作方式,但我不确定是否有办法使其仅在找到或未找到时打印一次。

文件夹结构:

根文件夹|
|项目文件夹
|file.xml
|其他文件/子文件夹

我希望代码的工作方式是进入"项目文件夹",对"file.xml"进行递归搜索,一旦找到,就打印一次"找到",否则打印一次"未找到"。

代码如下:

def check_file(x): #x = 根文件夹路径
    for d in next(os.walk(x))[1]: #如果我理解正确,[1] 将是项目文件夹
        for root, directories, files in os.walk(x):
            for name in files:
                if "file.xml" not in name:
                    print("找到")
                else:
                    print("文件不存在")

如果我将代码更改为:

for name in files:
    if "file.xml" in name:
        print("找到")
    else:
        pass

代码在技术上按预期工作,但实际上并没有太多帮助来指出文件不存在,因此这不是一个好的解决方案。如果我能够为代码提供特定的路径以进行查找,那将会更容易,但由于用户可以将"根文件夹"放在他们的计算机的任何位置,而"项目文件夹"的名称将根据项目而异,我不认为我能够为代码提供特定的位置。

是否有一种方法可以使用os.walk使其工作,或者是否有其他方法效果更好?

英文:

I am trying to search a given directory for a specific file, and if that file does not exist I would want the code to say "File does not exist". Currently with os.walk I can get this to work, however this will hit on every single file that isn't the specified file and print "File dos not exist". I know that this is how os.walk functions, but I was not sure if there is a way to make it only print out once if it is found or not found.

Folder structure:

root folder|
|Project Folder
|file.xml
|other files/subfolders

How I would want the code to work is to go inside of "Project Folder", do a recursive search for "file.xml", and once it is found print out once "Found", otherwise prints out once "Not found".

The code is:

def check_file(x): #x = root folder dir
   for d in next(os.walk(x))[1]: #if I understand correctly, [1] will be Project Folder
        for root, directories, files in os.walk(x):
            for name in files:
                if "file.xml" not in name:
                    print("found")
                else:
                    print("File Missing")

If I change the code to

            for name in files:
                if "file.xml" in name:
                    print("found")
                else:
                    pass

The code technically works as intended, but it doesn't really do much to help point out if it isn't there, so this isn't a good solution. It would be easier if I was able to give the code a specific path to look in, however as the user is able to place the 'root folder' anywhere on their machine as well as the 'project folder' would have different names depending on the project, I don't think I would be able to give the code a specific location.

Is there a way to get this to work with os.walk, or would another method work best?

答案1

得分: 3

glob 模块非常方便用于基于通配符的递归搜索。特别是 ** 通配符可以匹配任意深度的目录树,因此您可以在根目录的后代中的任何位置找到文件。

例如:

import glob

def check_file(x):  # 其中 x 是搜索的根目录
    files = glob.glob('**/file.xml', root_dir=x, recursive=True)
    if files:
        print(f"找到 {len(files)} 个匹配的文件")
    else:
        print("未找到匹配的文件")
英文:

The glob module is very convenient for this kind of wildcard-based recursive search. Particularly, the ** wildcard matches a directory tree of arbitrary depth, so you can find a file anywhere in the descendants of your root directory.

For example:

import glob

def check_file(x):  # where x is the root directory for the search
    files = glob.glob('**/file.xml', root_dir=x, recursive=True)
    if files:
        print(f"Found {len(files)} matching files")
    else:
        print("Did not find a matching file")

答案2

得分: 2

以下是翻译好的部分:

[Python.Docs]: os.walk(top, topdown=True, onerror=None, followlinks=False)的清单。

你不需要2个嵌套的循环。您只需要在每次迭代时检查基本文件名是否存在于os.walk生成的第3个成员中。

此实现处理了文件存在于多个目录的情况。如果您只需要打印文件一次(无论它在目录中出现多少次),则有函数search_file_once

code00.py

#!/usr/bin/env python

import os
import sys


def search_file(root_dir, base_name):
    found = 0
    for root, dirs, files in os.walk(root_dir):
        if base_name in files:
            print("Found: {:s}".format(os.path.join(root, base_name)))
            found += 1
    if not found:
        print("Not found")


# @TODO - cfati: Only care if file is found once
def search_file_once(root_dir, base_name):
    for root, dirs, files in os.walk(root_dir):
        if base_name in files:
            print("Found: {:s}".format(os.path.join(root, base_name)))
            break
    else:
        print("Not found")


def main(*argv):
    root = os.path.dirname(os.path.abspath(__file__))
    files = (
        "once.xml",
        "multiple.xml",
        "notpresent.xml",
    )
    for file in files:
        print("\n{:s} 中递归搜索 {:s}".format(root, file))
        search_file(root, file)


if __name__ == "__main__":
    print("Python {:s} {:03d} 位于 {:s}\n".format(" ".join(elem.strip() for elem in sys.version.split("\n")),
                                                64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print("\n完成。\n")
    sys.exit(rc)

输出

[cfati@CFATI-5510-0:e:\Work\Dev\StackExchange\StackOverflow\q076383189]> sopr.bat
### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###

[prompt]> tree /a /f
Folder PATH listing for volume SSD0-WORK
Volume serial number is AE9E-72AC
E:.
|   code00.py
|
\---dir0
    +---dir00
    +---dir01
    |       multiple.xml
    |       once.xml
    |
    \---dir02
        \---dir020
                multiple.xml


[prompt]>
[prompt]> "e:\Work\Dev\VEnvs\py_pc064_03.10_test0\Scripts\python.exe" ./code00.py
Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec  6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] 064 位于 win32


在 e:\Work\Dev\StackExchange\StackOverflow\q076383189 中递归搜索 once.xml
Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir01\once.xml

在 e:\Work\Dev\StackExchange\StackOverflow\q076383189 中递归搜索 multiple.xml
Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir01\multiple.xml
Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir02\dir020\multiple.xml

在 e:\Work\Dev\StackExchange\StackOverflow\q076383189 中递归搜索 notpresent.xml
Not found

完成。
英文:

Listing [Python.Docs]: os.walk(top, topdown=True, onerror=None, followlinks=False).

You don't need 2 nested loops. You only need to check on each iteration, if the base file name is present in the 3<sup>rd</sup> member that os.walk produces.<br>
This implementation handles the case of a file being present in multiple directories. If you only need print the file once (no matter how many times it's present in the directory), there's the function search_file_once.

code00.py:

#!/usr/bin/env python

import os
import sys


def search_file(root_dir, base_name):
    found = 0
    for root, dirs, files in os.walk(root_dir):
        if base_name in files:
            print(&quot;Found: {:s}&quot;.format(os.path.join(root, base_name)))
            found += 1
    if not found:
        print(&quot;Not found&quot;)


# @TODO - cfati: Only care if file is found once
def search_file_once(root_dir, base_name):
    for root, dirs, files in os.walk(root_dir):
        if base_name in files:
            print(&quot;Found: {:s}&quot;.format(os.path.join(root, base_name)))
            break
    else:
        print(&quot;Not found&quot;)


def main(*argv):
    root = os.path.dirname(os.path.abspath(__file__))
    files = (
        &quot;once.xml&quot;,
        &quot;multiple.xml&quot;,
        &quot;notpresent.xml&quot;,
    )
    for file in files:
        print(&quot;\nSearching recursively for {:s} in {:s}&quot;.format(file, root))
        search_file(root, file)


if __name__ == &quot;__main__&quot;:
    print(&quot;Python {:s} {:03d}bit on {:s}\n&quot;.format(&quot; &quot;.join(elem.strip() for elem in sys.version.split(&quot;\n&quot;)),
                                                   64 if sys.maxsize &gt; 0x100000000 else 32, sys.platform))
    rc = main(*sys.argv[1:])
    print(&quot;\nDone.\n&quot;)
    sys.exit(rc)

Output:

>
&gt; [cfati@CFATI-5510-0:e:\Work\Dev\StackExchange\StackOverflow\q076383189]&gt; sopr.bat
&gt; ### Set shorter prompt to better fit when pasted in StackOverflow (or other) pages ###
&gt;
&gt; [prompt]&gt; tree /a /f
&gt; Folder PATH listing for volume SSD0-WORK
&gt; Volume serial number is AE9E-72AC
&gt; E:.
&gt; | code00.py
&gt; |
&gt; \---dir0
&gt; +---dir00
&gt; +---dir01
&gt; | multiple.xml
&gt; | once.xml
&gt; |
&gt; \---dir02
&gt; \---dir020
&gt; multiple.xml
&gt;
&gt;
&gt; [prompt]&gt;
&gt; [prompt]&gt; &quot;e:\Work\Dev\VEnvs\py_pc064_03.10_test0\Scripts\python.exe&quot; ./code00.py
&gt; Python 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2022, 20:01:21) [MSC v.1934 64 bit (AMD64)] 064bit on win32
&gt;
&gt;
&gt; Searching recursively for once.xml in e:\Work\Dev\StackExchange\StackOverflow\q076383189
&gt; Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir01\once.xml
&gt;
&gt; Searching recursively for multiple.xml in e:\Work\Dev\StackExchange\StackOverflow\q076383189
&gt; Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir01\multiple.xml
&gt; Found: e:\Work\Dev\StackExchange\StackOverflow\q076383189\dir0\dir02\dir020\multiple.xml
&gt;
&gt; Searching recursively for notpresent.xml in e:\Work\Dev\StackExchange\StackOverflow\q076383189
&gt; Not found
&gt;
&gt; Done.
&gt;

This is just one of the multiple ways possible of doing this. Check [SO]: How do I list all files of a directory? (@CristiFati's answer) for more details.

答案3

得分: 1

以下是翻译好的部分:

我以前写过这样的函数以及其他一些函数想要提供它们以供参考其中一些可能需要最小或没有修改就可以用于您的情况

## 查找所有匹配项(不仅仅是一个):
## 示例用法:findAll('*.txt','/path/to/dir')

def findAll(name, path):
    result = []
    for root, dirs, files in os.walk(path):
        if name in files:
            result.append(os.path.join(root, name))
    return result

## 一个持续查找,直到找到所有目标文件的函数)
def findProjectFiles(Folder, targetFiles):
    import os
    os.chdir(Folder)
    filesFound = []
    while len(targetFiles) > len(filesFound):
        for root, dirs, files in os.walk(Folder):
            for f in files:
                current = os.path.join(Folder, f)
                if f in TargetFiles:
                    filesFound.append(f)
            for d in dirs:
                Folder = os.path.join(Folder, d)
            break;
    filePaths = os.path.abspath(filePaths)
    return filePaths

# 在文件夹中查找所有文件路径:

def findPaths(name, path):
    import os
    for root, dirs, files in os.walk(path):
        if name in files:
            return os.path.join(root, name)

## 可以轻松搜索返回的对象以查找您想要找到的字符串

## 类似,但这将匹配模式(即不必是精确的文件名匹配)。

import os, fnmatch
def findMatch(pattern, path):
    result = []
    for root, dirs, files in os.walk(path):
        for name in files:
            if fnmatch.fnmatch(name, pattern):
                result.append(os.path.join(root, name))
    return result
英文:

I have written a function like this and several others in the past. Want to provide them all for context, some will work for your case with minimal to no modifcation.

## Find ALL matches (not just one):
## Example Usage:  findAll(&#39;*.txt&#39;, &#39;/path/to/dir&#39;)

def findAll(name, path):
	result = []
	for root, dirs, files in os.walk(path):
		if name in files:
			result.append(os.path.join(root, name))
			return result

## A function that keeps going until all target files are found)
def findProjectFiles(Folder, targetFiles):
	import os
	os.chdir(Folder)
	filesFound=[]
	while len(targetFiles) &gt; len(filesFound):
		for root, dirs, files in os.walk(Folder):
			for f in files:
				current=os.path.join(Folder, f)
				if f in TargetFiles:
					filesFound.append(f)
			for d in dirs:
				Folder=os.path.join(Folder, d)
			break;
	filePaths=os.path.abspath(filePaths)
	return filePaths

# find all file paths in folder:

def findPaths(name, path):
	import os
	for root, dirs, files in os.walk(path):
		if name in files:
			return os.path.join(root, name)

## can search the object returned for the string you want to find easily

## Similar, but this will match a pattern (i.e. does not have to be exact file name match).

import os, fnmatch
def findMatch(pattern, path):
	result = []
	for root, dirs, files in os.walk(path):
		for name in files:
			if fnmatch.fnmatch(name, pattern):
				result.append(os.path.join(root, name))
				return result

huangapple
  • 本文由 发表于 2023年6月1日 22:53:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76383189.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定