使用正则表达式匹配列表中文件名的开头部分。

huangapple go评论87阅读模式
英文:

Using Regex to match the beginning portion of filenames in a list

问题

I have a list of file name, obtained from the os library's listdir method. The filenames will start with text like "D#######". So there may be D0000001.txt, D0000000.svg, D0000003.stl, etc. If the list is something like:

['D0000001.txt', 'D0000001.xlsx', 'D0000002.txt', 'D0000002.svg', 'D0000003.stl', 'D0000003.doc']

I want to all strings in the list that begin with 'D0000002'.

Will the regex 'D0000002*\.[a-zA-Z]{3}' always ONLY return a match object for 'D0000002.txt' & 'D0000002.svg', and nothing else, or is it possible this pattern could match other values?

Note that there is no guarantee that the filename preceding the extension will only contain the "D" + 8 digits. So it is possible for filenames like "D1234567_someMoreText_20230720.abc" to exist. And if the pattern is: 'D1234567*\.[a-zA-Z]{3}' the file noted should result in a match.

BTW, the comparison logic iterates through the list of filenames, performing the re.search() on each string in the list. That iteration adds matching names to a "matching files" list for return.

Thanks!

英文:

I'm bad at regular expression so was hoping to get some feedback on this particular regex expression.

I have a list of file name, obtained from the os library's listdir method. The filenames will start with text like "D#######". So there may be D0000001.txt, D0000000.svg, D0000003.stl, etc. If the list is something like:

['D0000001.txt', 'D0000001.xlsx', 'D0000002.txt', 'D0000002.svg', 'D0000003.stl', 'D0000003.doc']

I want to all strings in the list that begin with 'D0000002'.

Will the regex 'D0000002*\.[a-zA-Z]{3}' always ONLY return a match object for 'D0000002.txt' & 'D0000002.svg', and nothing else, or is it possible this pattern could match other values?

Note that there is no guarantee that the filename preceding the extension will only contain the "D" + 8 digits. So it is possible for filenames like "D1234567_someMoreText_20230720.abc" to exist. And if the pattern is:
'D1234567*\.[a-zA-Z]{3}'
the file noted should result in a match.

BTW, the comparison logic iterates through the list of filenames, performing the re.search() on each string in the list. That iteration adds matching names to a "matching files" list for return.

Thanks!

答案1

得分: -1

  • * 匹配前一个标记零次或多次。
    例如,a* 匹配 '''a''aa' 等等。
  • . 匹配任何一个字符。
    例如,. 匹配 'a''b''c''1''Z' 等等。

因此,如果你写 D0000002\.[a-zA-Z]{3},那么匹配的内容包括:

  • 字符串 D0000002.
  • 任意三个字母(小写或大写)

然而,这会匹配像 hi_D0000002.txt_hello 这样的文件名。
为了防止这种情况发生,你可以在正则表达式的开头和结尾添加 ^$,它们分别表示字符串的开头和结尾。

总之,^D0000002\.[a-zA-Z]{3}$ 应该有效。
这表示整个文件名是 D0000002.(字母)(字母)(字母)

附言:

re.match 函数将检查字符串的整个匹配,而 re.search 函数将检查字符串的一部分是否匹配。

因此,你可能想写:
re.match('D0000002\.[a-zA-Z]{3}', filename)
而不是:
re.search('^D0000002\.[a-zA-Z]{3}$', filename)

英文:

In regex:

  • * matches the previous token for zero or more times.
    For example, a* matches '', 'a', 'aa' and so on.
  • . matches any one character.
    For example, . matches 'a', 'b', 'c', '1', 'Z', etc.

Thus, if you write D0000002\.[a-zA-Z]{3}, that matches:

  • String D0000002.
  • Any three alphabet (lowercase or uppercase)

However, this will match filenames like hi_D0000002.txt_hello.
To prevent this, you can add ^ and $ in the start and end of regex expression, which shows the start of the string and end of the string respectively.

In conclusion, ^D0000002\.[a-zA-Z]{3}$ should work.
It means that the entire filename is D0000002.(alphabet)(alphabet)(alphabet)

P.S.

re.match function will check for entire match of the string, while re.search function will check for a match of part of the string.

So, you may want to write
re.match('D0000002\.[a-zA-Z]{3}', filename)
instead of
re.search('^D0000002\.[a-zA-Z]{3}$', filename)

huangapple
  • 本文由 发表于 2023年7月20日 22:11:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76730768.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定