英文:
Using Regex to match the beginning portion of filenames in a list
问题
I have a list of file name, obtained from the os library's listdir method. The filenames will start with text like "D#######". So there may be D0000001.txt, D0000000.svg, D0000003.stl, etc. If the list is something like:
['D0000001.txt', 'D0000001.xlsx', 'D0000002.txt', 'D0000002.svg', 'D0000003.stl', 'D0000003.doc']
I want to all strings in the list that begin with 'D0000002'.
Will the regex 'D0000002*\.[a-zA-Z]{3}'
always ONLY return a match object for 'D0000002.txt' & 'D0000002.svg', and nothing else, or is it possible this pattern could match other values?
Note that there is no guarantee that the filename preceding the extension will only contain the "D" + 8 digits. So it is possible for filenames like "D1234567_someMoreText_20230720.abc" to exist. And if the pattern is: 'D1234567*\.[a-zA-Z]{3}'
the file noted should result in a match.
BTW, the comparison logic iterates through the list of filenames, performing the re.search()
on each string in the list. That iteration adds matching names to a "matching files" list for return.
Thanks!
英文:
I'm bad at regular expression so was hoping to get some feedback on this particular regex expression.
I have a list of file name, obtained from the os library's listdir method. The filenames will start with text like "D#######". So there may be D0000001.txt, D0000000.svg, D0000003.stl, etc. If the list is something like:
['D0000001.txt', 'D0000001.xlsx', 'D0000002.txt', 'D0000002.svg', 'D0000003.stl', 'D0000003.doc']
I want to all strings in the list that begin with 'D0000002'.
Will the regex 'D0000002*\.[a-zA-Z]{3}'
always ONLY return a match object for 'D0000002.txt' & 'D0000002.svg', and nothing else, or is it possible this pattern could match other values?
Note that there is no guarantee that the filename preceding the extension will only contain the "D" + 8 digits. So it is possible for filenames like "D1234567_someMoreText_20230720.abc" to exist. And if the pattern is:
'D1234567*\.[a-zA-Z]{3}'
the file noted should result in a match.
BTW, the comparison logic iterates through the list of filenames, performing the re.search()
on each string in the list. That iteration adds matching names to a "matching files" list for return.
Thanks!
答案1
得分: -1
*
匹配前一个标记零次或多次。
例如,a*
匹配''
,'a'
,'aa'
等等。.
匹配任何一个字符。
例如,.
匹配'a'
,'b'
,'c'
,'1'
,'Z'
等等。
因此,如果你写 D0000002\.[a-zA-Z]{3}
,那么匹配的内容包括:
- 字符串
D0000002.
- 任意三个字母(小写或大写)
然而,这会匹配像 hi_D0000002.txt_hello
这样的文件名。
为了防止这种情况发生,你可以在正则表达式的开头和结尾添加 ^
和 $
,它们分别表示字符串的开头和结尾。
总之,^D0000002\.[a-zA-Z]{3}$
应该有效。
这表示整个文件名是 D0000002.(字母)(字母)(字母)
。
附言:
re.match
函数将检查字符串的整个匹配,而 re.search
函数将检查字符串的一部分是否匹配。
因此,你可能想写:
re.match('D0000002\.[a-zA-Z]{3}', filename)
而不是:
re.search('^D0000002\.[a-zA-Z]{3}$', filename)
英文:
In regex:
*
matches the previous token for zero or more times.
For example,a*
matches''
,'a'
,'aa'
and so on..
matches any one character.
For example,.
matches'a'
,'b'
,'c'
,'1'
,'Z'
, etc.
Thus, if you write D0000002\.[a-zA-Z]{3}
, that matches:
- String
D0000002.
- Any three alphabet (lowercase or uppercase)
However, this will match filenames like hi_D0000002.txt_hello
.
To prevent this, you can add ^
and $
in the start and end of regex expression, which shows the start of the string and end of the string respectively.
In conclusion, ^D0000002\.[a-zA-Z]{3}$
should work.
It means that the entire filename is D0000002.(alphabet)(alphabet)(alphabet)
P.S.
re.match
function will check for entire match of the string, while re.search
function will check for a match of part of the string.
So, you may want to write
re.match('D0000002\.[a-zA-Z]{3}', filename)
instead of
re.search('^D0000002\.[a-zA-Z]{3}$', filename)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论