英文:
How to identify a set of nearly-identical sentences while excluding sentences containing a specified word?
问题
I am trying to create a regex that will identify sentences that are structured as follows: The sentence begins with "I require", followed by any number of random words; the sentences ends with "to disclose the information." If the sentence contains the word "refuse", the regex rejects the sentence as not fitting the pattern.
When applied to the following sentences, this is how the regex will return:
- I require the creepy caterpillar to disclose the information. --TRUE
- I require the giant bug to refuse to disclose the information. --FALSE
- I require the dusty moth to disclose the information. --TRUE
- You must not refuse. --FALSE
Here's what I have tried:
I can get 3 out of the 4 example sentences correct by writing ^(?:(?!refuse)(.))+$
I can get 2 out of the 4 example sentences correct by writing I require [\s\w]+ to disclose the information.
I can get 2 of the 4 example sentences correct by writing ^(?:(?!refuse)(I require [\s\w]+ to disclose the information.))$
Edit: This question differs from the one at Regex Multiple Conditions because that question is dealing with two relatively simple truth conditions; this question involves a complex truth condition in the form of a sentence with variables in the middle of it. The answer at Regex Multiple Conditions however could be considered a duplicate because it also contains the piece of information I was missing, which is: the negative lookahead needed a wildcard.
英文:
I am trying to create a regex that will identify sentences that are structured as follows: The sentence begins with "I require", followed by any number of random words; the sentences ends with "to disclose the information." If the sentence contains the word "refuse", the regex rejects the sentence as not fitting the pattern.
When applied to the following sentences, this is how the regex will return:
- I require the creepy caterpillar to disclose the information. --TRUE
- I require the giant bug to refuse to disclose the information. --FALSE
- I require the dusty moth to disclose the information. --TRUE
- You must not refuse. --FALSE
Here's what I have tried:
I can get 3 out of the 4 example sentences correct by writing ^(?:(?!refuse)(.))+$
I can get 2 out of the 4 example sentences correct by writing I require [\s\w]+ to disclose the information.
I can get 2 of the 4 example sentences correct by writing ^(?:(?!refuse)(I require [\s\w]+ to disclose the information.))$
Edit: This question differs from the one at Regex Multiple Conditions because that question is dealing with two relatively simple truth conditions; this question involves a complex truth condition in the form of a sentence with variables in the middle of it. The answer at at Regex Multiple Conditions however could be considered a duplicate because it also contains the piece of information I was missing, which is: the negative lookahead needed a wildcard.
答案1
得分: 2
r'^(?!.\brefuse\b)I require\b[\w\s]\bto disclose the information.$' 可以解释如下:
- ^ 匹配字符串的开头
- (?!.*\brefuse\b) 开始正向预查
- .* 匹配零个或多个非行终止符的字符
- \brefuse\b 匹配被单词边界包围的文字 "refuse"
- ) 结束负向预查
- I require\b 匹配单词 "I require" 后面跟随的单词边界
- [\w\s]* 匹配零个或多个单词字符或空格字符
- \bto disclose the information. 匹配被单词边界包围的文字 "to disclose the information."
- $ 匹配字符串的末尾
英文:
You can use
r'^(?!.*\brefuse\b)I require\b[\w\s]*\bto disclose the information\.$'
This regular expression can be broken down as follows.
^ Match beginning of string
(?! Begin positive lookahead
.* Match zero or more characters other
than line terminators
\brefuse\b Match literal surriounded by word boundaries
) End negative lookahead
I require\b Match literal followed by word boundary
[\w\s]* Match zero or more word characters or
whitespace characters
\bto disclose the information\. Match literal preceded by word boundary
$ Match end of string
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论