英文:
Regex - matching without trailing second decimal and digits
问题
你的任务是在文本文件中找到符合以下模式的行:"SMC"后面跟着一个空格,1-3个数字,点号,和1-3个数字。你的问题是它还返回了第二个点号/小数点后面的数字。
import re
with open("caosDump.txt", 'r', encoding="cp1252") as inp, open("newCaosDump.txt", 'w') as output:
for line in inp:
if re.search(r'SMC\s\d{1,3}\.\d{1,3}', line):
output.write(line)
你尝试了许多方法,如正向/负向预查、单词边界等,但都没有奏效。添加^
和$
会破坏代码。
它还返回包含SMC 14.08.040
的行,但你不想在新文本文件中包括这些行。(注意:使用Python)
英文:
My task is to find lines on a text file that match the following pattern: "SMC" followed by a space, 1-3 digits, period, and 1-3 digits. My issue: It's also returning digits after the second period/decimal.
import re
with open("caosDump.txt", 'r', encoding="cp1252") as inp, open("newCaosDump.txt", 'w') as output:
for line in inp:
if re.search(r'SMC\s\d{1,3}\.\d{1,3}', line):
output.write(line)
I have tried many things such as positive/negative lookahead, word boundary, etc. but nothing worked. Adding ^ and $ break the code.
It’s also returning the lines that contain SMC 14.08.040
, but I don’t want to include these lines on my new text file. (Note: Using Python)
答案1
得分: 1
你需要添加一个类似这样的负向预查:
if re.search(r'SMC\s\d{1,3}\.\d{1,3}(?!\.?\d)', line):
这个 (?!\.?\d)
的负向预查将在最后的1-3个数字紧接着一个 .
或 .
+数字时失败匹配。
请注意,SMC
后面的 \b
是多余的,因为你要求单词后面必须有空格。如果 SMC
必须被作为一个完整单词匹配,\b
必须放在单词的前面,即 r'\bSMC\s\d{1,3}\.\d{1,3}(?!\.?\d)'
。
如果在 SMC
单词后可能有多个空格,可以使用 \s+
代替 \s
。
请参考正则表达式演示。
英文:
You need to add a negative lookahead like this:
if re.search(r'SMC\s\d{1,3}\.\d{1,3}(?!\.?\d)', line):
The (?!\.?\d)
lookahead will fail a match when the last 1-3 digits are immediately followed with a .
or .
+digit.
Note that the \b
after SMC
is redundant as you require a whitespace after the word. If SMC
must be matched as a whole word, \b
must be placed immediately before the word, i.e. r'\bSMC\s\d{1,3}\.\d{1,3}(?!\.?\d)'
.
If there can be more than one whitespace after SMC
word, use \s+
instead of \s
.
See the regex demo.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论