Python regex, one word with n characters followed by two words with one char

huangapple go评论64阅读模式
英文:

Python regex, one word with n characters followed by two words with one char

问题

我需要过滤以含有3个或更多字符的单词开头的字符串,然后是恰好两个只有一个字符的单词。在这三个单词之后,可以跟任何内容。

我尝试过这个表达式:

pattern = r'\w{3,}\s\w\s\w.*'

但它匹配了字符串 apple wrong a b c,这是不正确的(单词 "wrong" 多于一个字符)。

这是一个完整的示例:

import pandas as pd

df = pd.DataFrame({'text': ['apple wrong', 'apple wrong b c','apple a b correct', 'apple a b c correct']})
pattern = r'\w{3,}\s\w\s\w.*'
matches = df['text'].str.contains(pattern, regex=True)
result = df[matches]
print(result)
英文:

I need to filter strings that start with a word containing 3 or more characters, followed by exactly two words that have only one character. After these three words, anything can follow.

What I tried is this expression:

pattern = r'\w{3,}\s\w\s\w.*'

but it matches a string apple wrong a b c which is not correct (the word "wrong" has more than one char).

A complete example is here:

import pandas as pd

df = pd.DataFrame({'text': ['apple wrong', 'apple wrong b c','apple a b correct', 'apple a b c correct']})
pattern = r'\w{3,}\s\w\s\w.*'
matches = df['text'].str.contains(pattern, regex=True)
result = df[matches]
print(result)

答案1

得分: 1

在开头添加^应该解决问题。它确保模式从开头开始。

模式 = r'^\w{3,}\s\w\s\w.*'
英文:

Adding a ^ at the beginning should solve the problem. It makes sure that the pattern starts from the beginning.

pattern =  r'^\w{3,}\s\w\s\w.*' 

huangapple
  • 本文由 发表于 2023年5月26日 00:05:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76334295.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定