英文:
Python regex, one word with n characters followed by two words with one char
问题
我需要过滤以含有3个或更多字符的单词开头的字符串,然后是恰好两个只有一个字符的单词。在这三个单词之后,可以跟任何内容。
我尝试过这个表达式:
pattern = r'\w{3,}\s\w\s\w.*'
但它匹配了字符串 apple wrong a b c
,这是不正确的(单词 "wrong" 多于一个字符)。
这是一个完整的示例:
import pandas as pd
df = pd.DataFrame({'text': ['apple wrong', 'apple wrong b c','apple a b correct', 'apple a b c correct']})
pattern = r'\w{3,}\s\w\s\w.*'
matches = df['text'].str.contains(pattern, regex=True)
result = df[matches]
print(result)
英文:
I need to filter strings that start with a word containing 3 or more characters, followed by exactly two words that have only one character. After these three words, anything can follow.
What I tried is this expression:
pattern = r'\w{3,}\s\w\s\w.*'
but it matches a string apple wrong a b c
which is not correct (the word "wrong" has more than one char).
A complete example is here:
import pandas as pd
df = pd.DataFrame({'text': ['apple wrong', 'apple wrong b c','apple a b correct', 'apple a b c correct']})
pattern = r'\w{3,}\s\w\s\w.*'
matches = df['text'].str.contains(pattern, regex=True)
result = df[matches]
print(result)
答案1
得分: 1
在开头添加^
应该解决问题。它确保模式从开头开始。
模式 = r'^\w{3,}\s\w\s\w.*'
英文:
Adding a ^
at the beginning should solve the problem. It makes sure that the pattern starts from the beginning.
pattern = r'^\w{3,}\s\w\s\w.*'
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论