基于特定标准的路径排序

huangapple go评论67阅读模式
英文:

Path ordering based on particular criteria

问题

file_V2023.2.2_0.txt
file_V2023.2.3_0.txt
file_V2023.2.2_1.txt
file_V2023.2.3_1.txt

英文:

I have four files (or any number of files for that matter) named

file_V2023.2.2_0.txt
file_V2023.2.2_1.txt
file_V2023.2.3_0.txt
file_V2023.2.3_1.txt

If I do

from pathlib import Path
output_path = Path("./")
for video_path in sorted(output_path.glob("*.txt")):
    print(video_path)

I get the order above.

Is there a way I can get the following order:

file_V2023.2.2_0.txt
file_V2023.2.3_0.txt
file_V2023.2.2_1.txt
file_V2023.2.3_1.txt

答案1

得分: 1

sorted()函数具有一个key参数,您可以为该参数提供一个函数,该函数为要排序的元素提供排序键。

所以:

import re

names = [
    'file_V2023.2.2_0.txt',
    'file_V2023.2.3_0.txt',
    'file_V2023.2.2_1.txt',
    'file_V2023.2.3_1.txt'
]

name_pattern = re.compile('.*(\d+).(\d+)_(\d+).txt')
def get_key(name):
    a, b, c = re.match(name_pattern, name).groups()
    return int(a), int(c), int(b)  # 重新排序

print(sorted(names, key=get_key))

输出结果:

['file_V2023.2.2_0.txt', 'file_V2023.2.3_0.txt', 'file_V2023.2.2_1.txt', 'file_V2023.2.3_1.txt']

正则表达式非常有用,它可以将名称分解,因为它还适用于类似file_V2023.10.2_99.txt的名称。在正则表达式中用括号((, ))括起来的部分被视为单独的分组,然后可以使用.groups()来检索这些分组,由于有三个分组,它们可以分别存储在a, b, c中。

将匹配的数字字符串(例如对于'file_V2023.2.3_1.txt',它们将是'2''3''1')转换为int是为了确保像'19'这样的内容在字母数字排序中出现在'2'之后,而不是在'2'之前,因为以'1'开头的'19'在字母数字排序中出现在'2'之前。

请注意,对re.compile的调用仅用于提高效率。这样,正则表达式只需要编译一次,而不是每次调用函数时都需要编译。但如果您希望代码更短,或避免访问全局变量,可以使用以下方式:

def get_key(name):
    a, b, c = re.match('.*(\d+).(\d+)_(\d+).txt', name).groups()
    return int(a), int(c), int(b)  # 重新排序

还请注意,此示例假定您仅按数字排序,因此将作为排序键返回一个整数值的3元组。如果您有像'afile_V2023.2.3_0.txt''bfile_V2023.2.2_0.txt'这样的名称,并且希望'afile'在它们的编号尽管不同的情况下位于'bfile'之前,可以这样实现:

name_pattern = re.compile('(.*)\.(\d+).(\d+)_(\d+).txt')
def get_key(name):
    t, a, b, c = re.match(name_pattern, name).groups()
    return t, int(a), int(c), int(b)

也就是说,您当然可以在排序键中混合不同类型,只要Python知道如何对它们进行排序。

英文:

The sorted() function has a key parameter, and you give that a function that provides a sort key for the things you're sorting.

So:

import re


names = [
    'file_V2023.2.2_0.txt',
    'file_V2023.2.3_0.txt',
    'file_V2023.2.2_1.txt',
    'file_V2023.2.3_1.txt'
]


name_pattern = re.compile('.*\.(\d+)\.(\d+)_(\d+)\.txt')
def get_key(name):
    a, b, c = re.match(name_pattern, name).groups()
    return int(a), int(c), int(b)  # reordering here


print(sorted(names, key=get_key))

Output:

['file_V2023.2.2_0.txt', 'file_V2023.2.3_0.txt', 'file_V2023.2.2_1.txt', 'file_V2023.2.3_1.txt']

The regular expression is really useful to break up the name, since it will also work for names like file_V2023.10.2_99.txt. The parts in the regex that are enclosed in parentheses ((, )) are matched as separate groups, which are then retrieved with .groups() and since there's three of them, they can be spread over a, b, c.

The reason the matched numerical strings (e.g., for 'file_V2023.2.3_1.txt' they would be '2', '3', and '1') are converted to int, is to make sure that something like '19' would end up after '2' instead of before it because the '1' that '19' starts with comes before '2' alphanumerically.

Note that the call to re.compile is there only for efficiency. This way, the regex only has to be compiled once, instead of every time the function is called. But if you want the code to be shorter, or avoid accessing the global, this would do the same:

def get_key(name):
    a, b, c = re.match('.*\.(\d+)\.(\d+)_(\d+)\.txt', name).groups()
    return int(a), int(c), int(b)  # reordering here

Also note that this example assumes the numbers are the only thing you're sorting by, so only a 3-tuple of integer values is returned as a sorting key. If you have names like 'afile_V2023.2.3_0.txt' and 'bfile_V2023.2.2_0.txt', and you want the 'afile' to come before the 'bfile' in spite of their numbering, this works:

name_pattern = re.compile('(.*)\.(\d+)\.(\d+)_(\d+)\.txt')
def get_key(name):
    t, a, b, c = re.match(name_pattern, name).groups()
    return t, int(a), int(c), int(b)

That is, you can of course mix types in the sort key, as long as Python knows how to order them.

答案2

得分: 0

以下是已翻译的内容:

实际上,根据您的文件名设计,无需使用正则表达式。

简单示例:

names = [
    'file_V2023.2.2_0.txt',
    'file_V2023.2.3_1.txt',
    'file_V2023.2.3_0.txt',
    'file_V2023.2.2_1.txt',
]

names.sort(key=lambda name: float(name[11:-4]))

print(names)

转换示例:

>>> float("2.2_0")
>>> 2.2
>>> float("2.2_1")
>>> 2.21
英文:

Actually given the file name design you have, there is no need to use regular expressions.

Simple:

names = [
    'file_V2023.2.2_0.txt',
    'file_V2023.2.3_1.txt',
    'file_V2023.2.3_0.txt',
    'file_V2023.2.2_1.txt',
]


names.sort(key=lambda name: float(name[11:-4]))

print(names)

Examples of casting:

>>> float("2.2_0")
>>> 2.2
>>> float("2.2_1")
>>> 2.21

huangapple
  • 本文由 发表于 2023年3月1日 12:38:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/75599626.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定