英文:
Why is the order of directories listed by os.walk() the same irrespective of the "topdown" parameter?
问题
我有一个数据集在特定的根目录中,并尝试迭代目录和文件,但topdown
参数不按预期工作。
Images
├── n01440764
│   ├── image1.jpg
│ ├─ ...
│   └── image50.jpg
└── n01443537
├── image1.jpg
├─ ...
└── image50.jpg
import os
image_dir = os.walk("Images", topdown=False)
for root, dirs, files in image_dir:
for d in dirs:
print(d)
输出:
n01443537
n01440764
无论topdown=False
还是True
,结果都一样。但我期望的是:
n01440764
n01443537
英文:
I have a dataset in a specific root and trying to iterate through dirs and files, but the topdown
parameter does not work as expected.
Images
├── n01440764
│   ├── image1.jpg
│ ├─ ...
│   └── image50.jpg
└── n01443537
├── image1.jpg
├─ ...
└── image50.jpg
import os
image_dir = os.walk("Images", topdown=False)
for root, dirs, files in image_dir:
for d in dirs:
print(d)
Output:
n01443537
n01440764
Either topdown=False
or True
, the result is the same. But I am expecting:
n01440764
n01443537
答案1
得分: 1
The topdown
option determines when the (dirpath, dirnames, filenames) tuples for a directory are generated. If topdown
is True or not specified, the directory's triple is generated before its subdirectories. If topdown
is False, the triple for a directory is generated after its subdirectories. Regardless of topdown
, the list of subdirectories is retrieved before generating tuples.
When topdown
is True, you can modify dirnames in place to filter paths. When False, dirnames' modification won't affect behavior.
In Python 3.5+, os.walk() uses os.scandir(), returning entries in arbitrary order. Prior, os.listdir() was used, also arbitrary. You can order directories using for dir in sorted(dirs)
within your os.walk() loop.
英文:
The topdown
option does not change the order in which directories at the same level are walked. Instead, it determines whether the (dirpath, dirnames, filenames) tuples for a directory are generated before or after the tuples for its subdirectories. In the default or True option, you can modify dirnames in place to filter out some paths and not walk them. In the False option, the subdirectories are walked first, so that sort of filtering is not possible.
From the official documentation for os.walk()
> If optional argument topdown is True or not specified, the triple for
> a directory is generated before the triples for any of its
> subdirectories (directories are generated top-down). If topdown is
> False, the triple for a directory is generated after the triples for
> all of its subdirectories (directories are generated bottom-up). No
> matter the value of topdown, the list of subdirectories is retrieved
> before the tuples for the directory and its subdirectories are
> generated.
>
> When topdown is True, the caller can modify the dirnames list in-place
> (perhaps using del or slice assignment), and walk() will only recurse
> into the subdirectories whose names remain in dirnames; this can be
> used to prune the search, impose a specific order of visiting, or even
> to inform walk() about directories the caller creates or renames
> before it resumes walk() again. Modifying dirnames when topdown is
> False has no effect on the behavior of the walk, because in bottom-up
> mode the directories in dirnames are generated before dirpath itself
> is generated.
As of Python 3.5, os.walk() uses os.scandir(), which returns them in an arbitrary order:
> The list is in arbitrary order, and does not include the special
> entries '.' and '..' even if they are present in the directory
Prior to Python 3.5, it used os.listdir(), which also returned the directories in arbitrary order. See also the answer to this question:
https://stackoverflow.com/questions/18282370/in-what-order-does-os-walk-iterates-iterate
You can get the directories in the order you want by using
for dir in sorted(dirs)
within your os.walk() loop.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论