2023年7月20日 22:11:26go评论87阅读模式

英文:

Using Regex to match the beginning portion of filenames in a list

问题

I have a list of file name, obtained from the os library's listdir method. The filenames will start with text like "D#######". So there may be D0000001.txt, D0000000.svg, D0000003.stl, etc. If the list is something like:

['D0000001.txt', 'D0000001.xlsx', 'D0000002.txt', 'D0000002.svg', 'D0000003.stl', 'D0000003.doc']

I want to all strings in the list that begin with 'D0000002'.

Will the regex 'D0000002*\.[a-zA-Z]{3}' always ONLY return a match object for 'D0000002.txt' & 'D0000002.svg', and nothing else, or is it possible this pattern could match other values?

Note that there is no guarantee that the filename preceding the extension will only contain the "D" + 8 digits. So it is possible for filenames like "D1234567_someMoreText_20230720.abc" to exist. And if the pattern is: 'D1234567*\.[a-zA-Z]{3}' the file noted should result in a match.

BTW, the comparison logic iterates through the list of filenames, performing the re.search() on each string in the list. That iteration adds matching names to a "matching files" list for return.

Thanks!

英文:

I'm bad at regular expression so was hoping to get some feedback on this particular regex expression.

[&#39;D0000001.txt&#39;, &#39;D0000001.xlsx&#39;, &#39;D0000002.txt&#39;, &#39;D0000002.svg&#39;, &#39;D0000003.stl&#39;, &#39;D0000003.doc&#39;]

I want to all strings in the list that begin with 'D0000002'.

Will the regex 'D0000002*\.[a-zA-Z]{3}' always ONLY return a match object for 'D0000002.txt' & 'D0000002.svg', and nothing else, or is it possible this pattern could match other values?

Note that there is no guarantee that the filename preceding the extension will only contain the "D" + 8 digits. So it is possible for filenames like "D1234567_someMoreText_20230720.abc" to exist. And if the pattern is:
'D1234567*\.[a-zA-Z]{3}'
the file noted should result in a match.

BTW, the comparison logic iterates through the list of filenames, performing the re.search() on each string in the list. That iteration adds matching names to a "matching files" list for return.

Thanks!

答案1

得分: -1

* 匹配前一个标记零次或多次。
例如，a* 匹配 ''，'a'，'aa' 等等。
. 匹配任何一个字符。
例如，. 匹配 'a'，'b'，'c'，'1'，'Z' 等等。

因此，如果你写 D0000002\.[a-zA-Z]{3}，那么匹配的内容包括：

字符串 D0000002.
任意三个字母（小写或大写）

然而，这会匹配像 hi_D0000002.txt_hello 这样的文件名。
为了防止这种情况发生，你可以在正则表达式的开头和结尾添加 ^ 和 $，它们分别表示字符串的开头和结尾。

总之，^D0000002\.[a-zA-Z]{3}$ 应该有效。
这表示整个文件名是 D0000002.(字母)(字母)(字母)。

附言：

re.match 函数将检查字符串的整个匹配，而 re.search 函数将检查字符串的一部分是否匹配。

因此，你可能想写：
re.match('D0000002\.[a-zA-Z]{3}', filename)
而不是：
re.search('^D0000002\.[a-zA-Z]{3}$', filename)

英文:

In regex:

* matches the previous token for zero or more times.
For example, a* matches '', 'a', 'aa' and so on.
. matches any one character.
For example, . matches 'a', 'b', 'c', '1', 'Z', etc.

Thus, if you write D0000002\.[a-zA-Z]{3}, that matches:

String D0000002.
Any three alphabet (lowercase or uppercase)

However, this will match filenames like hi_D0000002.txt_hello.
To prevent this, you can add ^ and $ in the start and end of regex expression, which shows the start of the string and end of the string respectively.

In conclusion, ^D0000002\.[a-zA-Z]{3}$ should work.
It means that the entire filename is D0000002.(alphabet)(alphabet)(alphabet)

P.S.

re.match function will check for entire match of the string, while re.search function will check for a match of part of the string.

So, you may want to write
re.match('D0000002\.[a-zA-Z]{3}', filename)
instead of
re.search('^D0000002\.[a-zA-Z]{3}$', filename)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用正则表达式匹配列表中文件名的开头部分。

问题

答案1

从 Java Android Studio 的堆栈或字符串中移除逗号和方括号

Highlight python-docx with regex and spacy.

Regex to match everything between

如何替换所有不位于两个字符之间的字符串？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论