英文:
Using Regex to match the beginning portion of filenames in a list
问题
I have a list of file name, obtained from the os library's listdir method. The filenames will start with text like "D#######". So there may be D0000001.txt, D0000000.svg, D0000003.stl, etc. If the list is something like:
['D0000001.txt', 'D0000001.xlsx', 'D0000002.txt', 'D0000002.svg', 'D0000003.stl', 'D0000003.doc']
I want to all strings in the list that begin with 'D0000002'.
Will the regex 'D0000002*\.[a-zA-Z]{3}' always ONLY return a match object for 'D0000002.txt' & 'D0000002.svg', and nothing else, or is it possible this pattern could match other values?
Note that there is no guarantee that the filename preceding the extension will only contain the "D" + 8 digits. So it is possible for filenames like "D1234567_someMoreText_20230720.abc" to exist. And if the pattern is: 'D1234567*\.[a-zA-Z]{3}' the file noted should result in a match.
BTW, the comparison logic iterates through the list of filenames, performing the re.search() on each string in the list. That iteration adds matching names to a "matching files" list for return.
Thanks!
英文:
I'm bad at regular expression so was hoping to get some feedback on this particular regex expression.
I have a list of file name, obtained from the os library's listdir method. The filenames will start with text like "D#######". So there may be D0000001.txt, D0000000.svg, D0000003.stl, etc. If the list is something like:
['D0000001.txt', 'D0000001.xlsx', 'D0000002.txt', 'D0000002.svg', 'D0000003.stl', 'D0000003.doc']
I want to all strings in the list that begin with 'D0000002'.
Will the regex 'D0000002*\.[a-zA-Z]{3}' always ONLY return a match object for 'D0000002.txt' & 'D0000002.svg', and nothing else, or is it possible this pattern could match other values?
Note that there is no guarantee that the filename preceding the extension will only contain the "D" + 8 digits. So it is possible for filenames like "D1234567_someMoreText_20230720.abc" to exist. And if the pattern is:
'D1234567*\.[a-zA-Z]{3}'
the file noted should result in a match.
BTW, the comparison logic iterates through the list of filenames, performing the re.search() on each string in the list. That iteration adds matching names to a "matching files" list for return.
Thanks!
答案1
得分: -1
*匹配前一个标记零次或多次。
例如,a*匹配'','a','aa'等等。.匹配任何一个字符。
例如,.匹配'a','b','c','1','Z'等等。
因此,如果你写 D0000002\.[a-zA-Z]{3},那么匹配的内容包括:
- 字符串
D0000002. - 任意三个字母(小写或大写)
然而,这会匹配像 hi_D0000002.txt_hello 这样的文件名。
为了防止这种情况发生,你可以在正则表达式的开头和结尾添加 ^ 和 $,它们分别表示字符串的开头和结尾。
总之,^D0000002\.[a-zA-Z]{3}$ 应该有效。
这表示整个文件名是 D0000002.(字母)(字母)(字母)。
附言:
re.match 函数将检查字符串的整个匹配,而 re.search 函数将检查字符串的一部分是否匹配。
因此,你可能想写:
re.match('D0000002\.[a-zA-Z]{3}', filename)
而不是:
re.search('^D0000002\.[a-zA-Z]{3}$', filename)
英文:
In regex:
*matches the previous token for zero or more times.
For example,a*matches'','a','aa'and so on..matches any one character.
For example,.matches'a','b','c','1','Z', etc.
Thus, if you write D0000002\.[a-zA-Z]{3}, that matches:
- String
D0000002. - Any three alphabet (lowercase or uppercase)
However, this will match filenames like hi_D0000002.txt_hello.
To prevent this, you can add ^ and $ in the start and end of regex expression, which shows the start of the string and end of the string respectively.
In conclusion, ^D0000002\.[a-zA-Z]{3}$ should work.
It means that the entire filename is D0000002.(alphabet)(alphabet)(alphabet)
P.S.
re.match function will check for entire match of the string, while re.search function will check for a match of part of the string.
So, you may want to write
re.match('D0000002\.[a-zA-Z]{3}', filename)
instead of
re.search('^D0000002\.[a-zA-Z]{3}$', filename)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论