英文:
Regex - matching without trailing second decimal and digits
问题
你的任务是在文本文件中找到符合以下模式的行:"SMC"后面跟着一个空格,1-3个数字,点号,和1-3个数字。你的问题是它还返回了第二个点号/小数点后面的数字。
import re
with open("caosDump.txt", 'r', encoding="cp1252") as inp, open("newCaosDump.txt", 'w') as output:
for line in inp:
if re.search(r'SMC\s\d{1,3}\.\d{1,3}', line):
output.write(line)
你尝试了许多方法,如正向/负向预查、单词边界等,但都没有奏效。添加^和$会破坏代码。
它还返回包含SMC 14.08.040的行,但你不想在新文本文件中包括这些行。(注意:使用Python)
英文:
My task is to find lines on a text file that match the following pattern: "SMC" followed by a space, 1-3 digits, period, and 1-3 digits. My issue: It's also returning digits after the second period/decimal.
import re
with open("caosDump.txt", 'r', encoding="cp1252") as inp, open("newCaosDump.txt", 'w') as output:
for line in inp:
if re.search(r'SMC\s\d{1,3}\.\d{1,3}', line):
output.write(line)
I have tried many things such as positive/negative lookahead, word boundary, etc. but nothing worked. Adding ^ and $ break the code.
It’s also returning the lines that contain SMC 14.08.040, but I don’t want to include these lines on my new text file. (Note: Using Python)
答案1
得分: 1
你需要添加一个类似这样的负向预查:
if re.search(r'SMC\s\d{1,3}\.\d{1,3}(?!\.?\d)', line):
这个 (?!\.?\d) 的负向预查将在最后的1-3个数字紧接着一个 . 或 .+数字时失败匹配。
请注意,SMC 后面的 \b 是多余的,因为你要求单词后面必须有空格。如果 SMC 必须被作为一个完整单词匹配,\b 必须放在单词的前面,即 r'\bSMC\s\d{1,3}\.\d{1,3}(?!\.?\d)'。
如果在 SMC 单词后可能有多个空格,可以使用 \s+ 代替 \s。
请参考正则表达式演示。
英文:
You need to add a negative lookahead like this:
if re.search(r'SMC\s\d{1,3}\.\d{1,3}(?!\.?\d)', line):
The (?!\.?\d) lookahead will fail a match when the last 1-3 digits are immediately followed with a . or .+digit.
Note that the \b after SMC is redundant as you require a whitespace after the word. If SMC must be matched as a whole word, \b must be placed immediately before the word, i.e. r'\bSMC\s\d{1,3}\.\d{1,3}(?!\.?\d)'.
If there can be more than one whitespace after SMC word, use \s+ instead of \s.
See the regex demo.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论