2023年3月1日 12:38:39go评论96阅读模式

英文:

Path ordering based on particular criteria

问题

file_V2023.2.2_0.txt
file_V2023.2.3_0.txt
file_V2023.2.2_1.txt
file_V2023.2.3_1.txt

英文:

I have four files (or any number of files for that matter) named

file_V2023.2.2_0.txt
file_V2023.2.2_1.txt
file_V2023.2.3_0.txt
file_V2023.2.3_1.txt

If I do

from pathlib import Path
output_path = Path(&quot;./&quot;)
for video_path in sorted(output_path.glob(&quot;*.txt&quot;)):
    print(video_path)

I get the order above.

Is there a way I can get the following order:

file_V2023.2.2_0.txt
file_V2023.2.3_0.txt
file_V2023.2.2_1.txt
file_V2023.2.3_1.txt

答案1

得分: 1

sorted()函数具有一个key参数，您可以为该参数提供一个函数，该函数为要排序的元素提供排序键。

所以：

import re
names = [
    'file_V2023.2.2_0.txt',
    'file_V2023.2.3_0.txt',
    'file_V2023.2.2_1.txt',
    'file_V2023.2.3_1.txt'
]
name_pattern = re.compile('.*(\d+).(\d+)_(\d+).txt')
def get_key(name):
    a, b, c = re.match(name_pattern, name).groups()
    return int(a), int(c), int(b)  # 重新排序
print(sorted(names, key=get_key))

输出结果：

['file_V2023.2.2_0.txt', 'file_V2023.2.3_0.txt', 'file_V2023.2.2_1.txt', 'file_V2023.2.3_1.txt']

正则表达式非常有用，它可以将名称分解，因为它还适用于类似file_V2023.10.2_99.txt的名称。在正则表达式中用括号((, ))括起来的部分被视为单独的分组，然后可以使用.groups()来检索这些分组，由于有三个分组，它们可以分别存储在a, b, c中。

将匹配的数字字符串（例如对于'file_V2023.2.3_1.txt'，它们将是'2'，'3'和'1'）转换为int是为了确保像'19'这样的内容在字母数字排序中出现在'2'之后，而不是在'2'之前，因为以'1'开头的'19'在字母数字排序中出现在'2'之前。

请注意，对re.compile的调用仅用于提高效率。这样，正则表达式只需要编译一次，而不是每次调用函数时都需要编译。但如果您希望代码更短，或避免访问全局变量，可以使用以下方式：

def get_key(name):
    a, b, c = re.match('.*(\d+).(\d+)_(\d+).txt', name).groups()
    return int(a), int(c), int(b)  # 重新排序

还请注意，此示例假定您仅按数字排序，因此将作为排序键返回一个整数值的3元组。如果您有像'afile_V2023.2.3_0.txt'和'bfile_V2023.2.2_0.txt'这样的名称，并且希望'afile'在它们的编号尽管不同的情况下位于'bfile'之前，可以这样实现：

name_pattern = re.compile('(.*)\.(\d+).(\d+)_(\d+).txt')
def get_key(name):
    t, a, b, c = re.match(name_pattern, name).groups()
    return t, int(a), int(c), int(b)

也就是说，您当然可以在排序键中混合不同类型，只要Python知道如何对它们进行排序。

英文:

The sorted() function has a key parameter, and you give that a function that provides a sort key for the things you're sorting.

So:

import re
names = [
    &#39;file_V2023.2.2_0.txt&#39;,
    &#39;file_V2023.2.3_0.txt&#39;,
    &#39;file_V2023.2.2_1.txt&#39;,
    &#39;file_V2023.2.3_1.txt&#39;
]
name_pattern = re.compile(&#39;.*\.(\d+)\.(\d+)_(\d+)\.txt&#39;)
def get_key(name):
    a, b, c = re.match(name_pattern, name).groups()
    return int(a), int(c), int(b)  # reordering here
print(sorted(names, key=get_key))

Output:

[&#39;file_V2023.2.2_0.txt&#39;, &#39;file_V2023.2.3_0.txt&#39;, &#39;file_V2023.2.2_1.txt&#39;, &#39;file_V2023.2.3_1.txt&#39;]

The regular expression is really useful to break up the name, since it will also work for names like file_V2023.10.2_99.txt. The parts in the regex that are enclosed in parentheses ((, )) are matched as separate groups, which are then retrieved with .groups() and since there's three of them, they can be spread over a, b, c.

The reason the matched numerical strings (e.g., for 'file_V2023.2.3_1.txt' they would be '2', '3', and '1') are converted to int, is to make sure that something like '19' would end up after '2' instead of before it because the '1' that '19' starts with comes before '2' alphanumerically.

Note that the call to re.compile is there only for efficiency. This way, the regex only has to be compiled once, instead of every time the function is called. But if you want the code to be shorter, or avoid accessing the global, this would do the same:

def get_key(name):
    a, b, c = re.match(&#39;.*\.(\d+)\.(\d+)_(\d+)\.txt&#39;, name).groups()
    return int(a), int(c), int(b)  # reordering here

Also note that this example assumes the numbers are the only thing you're sorting by, so only a 3-tuple of integer values is returned as a sorting key. If you have names like 'afile_V2023.2.3_0.txt' and 'bfile_V2023.2.2_0.txt', and you want the 'afile' to come before the 'bfile' in spite of their numbering, this works:

name_pattern = re.compile(&#39;(.*)\.(\d+)\.(\d+)_(\d+)\.txt&#39;)
def get_key(name):
    t, a, b, c = re.match(name_pattern, name).groups()
    return t, int(a), int(c), int(b)

That is, you can of course mix types in the sort key, as long as Python knows how to order them.

答案2

得分: 0

以下是已翻译的内容：

实际上，根据您的文件名设计，无需使用正则表达式。

简单示例：

names = [
    'file_V2023.2.2_0.txt',
    'file_V2023.2.3_1.txt',
    'file_V2023.2.3_0.txt',
    'file_V2023.2.2_1.txt',
]
names.sort(key=lambda name: float(name[11:-4]))
print(names)

转换示例：

>>> float("2.2_0")
>>> 2.2
>>> float("2.2_1")
>>> 2.21

英文:

Actually given the file name design you have, there is no need to use regular expressions.

Simple:

names = [
    &#39;file_V2023.2.2_0.txt&#39;,
    &#39;file_V2023.2.3_1.txt&#39;,
    &#39;file_V2023.2.3_0.txt&#39;,
    &#39;file_V2023.2.2_1.txt&#39;,
]
names.sort(key=lambda name: float(name[11:-4]))
print(names)

Examples of casting:

&gt;&gt;&gt; float(&quot;2.2_0&quot;)
&gt;&gt;&gt; 2.2
&gt;&gt;&gt; float(&quot;2.2_1&quot;)
&gt;&gt;&gt; 2.21

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

基于特定标准的路径排序

问题

答案1

答案2

Python的base64解码和编码后无法正常工作。

如何编写PySpark脚本将电子邮件内容转换为长字符串以供CSV文件使用？

不是可执行对象：’SELECT * FROM LoanParcel’

从2字节的十六进制中获取十进制数值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。