英文:
Path ordering based on particular criteria
问题
file_V2023.2.2_0.txt
file_V2023.2.3_0.txt
file_V2023.2.2_1.txt
file_V2023.2.3_1.txt
英文:
I have four files (or any number of files for that matter) named
file_V2023.2.2_0.txt
file_V2023.2.2_1.txt
file_V2023.2.3_0.txt
file_V2023.2.3_1.txt
If I do
from pathlib import Path
output_path = Path("./")
for video_path in sorted(output_path.glob("*.txt")):
print(video_path)
I get the order above.
Is there a way I can get the following order:
file_V2023.2.2_0.txt
file_V2023.2.3_0.txt
file_V2023.2.2_1.txt
file_V2023.2.3_1.txt
答案1
得分: 1
sorted()
函数具有一个key
参数,您可以为该参数提供一个函数,该函数为要排序的元素提供排序键。
所以:
import re
names = [
'file_V2023.2.2_0.txt',
'file_V2023.2.3_0.txt',
'file_V2023.2.2_1.txt',
'file_V2023.2.3_1.txt'
]
name_pattern = re.compile('.*(\d+).(\d+)_(\d+).txt')
def get_key(name):
a, b, c = re.match(name_pattern, name).groups()
return int(a), int(c), int(b) # 重新排序
print(sorted(names, key=get_key))
输出结果:
['file_V2023.2.2_0.txt', 'file_V2023.2.3_0.txt', 'file_V2023.2.2_1.txt', 'file_V2023.2.3_1.txt']
正则表达式非常有用,它可以将名称分解,因为它还适用于类似file_V2023.10.2_99.txt
的名称。在正则表达式中用括号((
, )
)括起来的部分被视为单独的分组,然后可以使用.groups()
来检索这些分组,由于有三个分组,它们可以分别存储在a, b, c
中。
将匹配的数字字符串(例如对于'file_V2023.2.3_1.txt'
,它们将是'2'
,'3'
和'1'
)转换为int
是为了确保像'19'
这样的内容在字母数字排序中出现在'2'
之后,而不是在'2'
之前,因为以'1'
开头的'19'
在字母数字排序中出现在'2'
之前。
请注意,对re.compile
的调用仅用于提高效率。这样,正则表达式只需要编译一次,而不是每次调用函数时都需要编译。但如果您希望代码更短,或避免访问全局变量,可以使用以下方式:
def get_key(name):
a, b, c = re.match('.*(\d+).(\d+)_(\d+).txt', name).groups()
return int(a), int(c), int(b) # 重新排序
还请注意,此示例假定您仅按数字排序,因此将作为排序键返回一个整数值的3元组。如果您有像'afile_V2023.2.3_0.txt'
和'bfile_V2023.2.2_0.txt'
这样的名称,并且希望'afile'
在它们的编号尽管不同的情况下位于'bfile'
之前,可以这样实现:
name_pattern = re.compile('(.*)\.(\d+).(\d+)_(\d+).txt')
def get_key(name):
t, a, b, c = re.match(name_pattern, name).groups()
return t, int(a), int(c), int(b)
也就是说,您当然可以在排序键中混合不同类型,只要Python知道如何对它们进行排序。
英文:
The sorted()
function has a key
parameter, and you give that a function that provides a sort key for the things you're sorting.
So:
import re
names = [
'file_V2023.2.2_0.txt',
'file_V2023.2.3_0.txt',
'file_V2023.2.2_1.txt',
'file_V2023.2.3_1.txt'
]
name_pattern = re.compile('.*\.(\d+)\.(\d+)_(\d+)\.txt')
def get_key(name):
a, b, c = re.match(name_pattern, name).groups()
return int(a), int(c), int(b) # reordering here
print(sorted(names, key=get_key))
Output:
['file_V2023.2.2_0.txt', 'file_V2023.2.3_0.txt', 'file_V2023.2.2_1.txt', 'file_V2023.2.3_1.txt']
The regular expression is really useful to break up the name, since it will also work for names like file_V2023.10.2_99.txt
. The parts in the regex that are enclosed in parentheses ((
, )
) are matched as separate groups, which are then retrieved with .groups()
and since there's three of them, they can be spread over a, b, c
.
The reason the matched numerical strings (e.g., for 'file_V2023.2.3_1.txt'
they would be '2'
, '3'
, and '1'
) are converted to int
, is to make sure that something like '19'
would end up after '2'
instead of before it because the '1'
that '19'
starts with comes before '2'
alphanumerically.
Note that the call to re.compile
is there only for efficiency. This way, the regex only has to be compiled once, instead of every time the function is called. But if you want the code to be shorter, or avoid accessing the global, this would do the same:
def get_key(name):
a, b, c = re.match('.*\.(\d+)\.(\d+)_(\d+)\.txt', name).groups()
return int(a), int(c), int(b) # reordering here
Also note that this example assumes the numbers are the only thing you're sorting by, so only a 3-tuple of integer values is returned as a sorting key. If you have names like 'afile_V2023.2.3_0.txt'
and 'bfile_V2023.2.2_0.txt'
, and you want the 'afile'
to come before the 'bfile'
in spite of their numbering, this works:
name_pattern = re.compile('(.*)\.(\d+)\.(\d+)_(\d+)\.txt')
def get_key(name):
t, a, b, c = re.match(name_pattern, name).groups()
return t, int(a), int(c), int(b)
That is, you can of course mix types in the sort key, as long as Python knows how to order them.
答案2
得分: 0
以下是已翻译的内容:
实际上,根据您的文件名设计,无需使用正则表达式。
简单示例:
names = [
'file_V2023.2.2_0.txt',
'file_V2023.2.3_1.txt',
'file_V2023.2.3_0.txt',
'file_V2023.2.2_1.txt',
]
names.sort(key=lambda name: float(name[11:-4]))
print(names)
转换示例:
>>> float("2.2_0")
>>> 2.2
>>> float("2.2_1")
>>> 2.21
英文:
Actually given the file name design you have, there is no need to use regular expressions.
Simple:
names = [
'file_V2023.2.2_0.txt',
'file_V2023.2.3_1.txt',
'file_V2023.2.3_0.txt',
'file_V2023.2.2_1.txt',
]
names.sort(key=lambda name: float(name[11:-4]))
print(names)
Examples of casting:
>>> float("2.2_0")
>>> 2.2
>>> float("2.2_1")
>>> 2.21
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论