如何识别一组几乎相同的句子,同时排除包含指定词汇的句子?

huangapple go评论69阅读模式
英文:

How to identify a set of nearly-identical sentences while excluding sentences containing a specified word?

问题

I am trying to create a regex that will identify sentences that are structured as follows: The sentence begins with "I require", followed by any number of random words; the sentences ends with "to disclose the information." If the sentence contains the word "refuse", the regex rejects the sentence as not fitting the pattern.

When applied to the following sentences, this is how the regex will return:

  • I require the creepy caterpillar to disclose the information. --TRUE
  • I require the giant bug to refuse to disclose the information. --FALSE
  • I require the dusty moth to disclose the information. --TRUE
  • You must not refuse. --FALSE

Here's what I have tried:

I can get 3 out of the 4 example sentences correct by writing ^(?:(?!refuse)(.))+$

I can get 2 out of the 4 example sentences correct by writing I require [\s\w]+ to disclose the information.

I can get 2 of the 4 example sentences correct by writing ^(?:(?!refuse)(I require [\s\w]+ to disclose the information.))$

Edit: This question differs from the one at Regex Multiple Conditions because that question is dealing with two relatively simple truth conditions; this question involves a complex truth condition in the form of a sentence with variables in the middle of it. The answer at Regex Multiple Conditions however could be considered a duplicate because it also contains the piece of information I was missing, which is: the negative lookahead needed a wildcard.

英文:

I am trying to create a regex that will identify sentences that are structured as follows: The sentence begins with "I require", followed by any number of random words; the sentences ends with "to disclose the information." If the sentence contains the word "refuse", the regex rejects the sentence as not fitting the pattern.

When applied to the following sentences, this is how the regex will return:

  • I require the creepy caterpillar to disclose the information. --TRUE
  • I require the giant bug to refuse to disclose the information. --FALSE
  • I require the dusty moth to disclose the information. --TRUE
  • You must not refuse. --FALSE

Here's what I have tried:

I can get 3 out of the 4 example sentences correct by writing ^(?:(?!refuse)(.))+$

I can get 2 out of the 4 example sentences correct by writing I require [\s\w]+ to disclose the information.

I can get 2 of the 4 example sentences correct by writing ^(?:(?!refuse)(I require [\s\w]+ to disclose the information.))$

Edit: This question differs from the one at Regex Multiple Conditions because that question is dealing with two relatively simple truth conditions; this question involves a complex truth condition in the form of a sentence with variables in the middle of it. The answer at at Regex Multiple Conditions however could be considered a duplicate because it also contains the piece of information I was missing, which is: the negative lookahead needed a wildcard.

答案1

得分: 2

r'^(?!.\brefuse\b)I require\b[\w\s]\bto disclose the information.$' 可以解释如下:

  • ^ 匹配字符串的开头
  • (?!.*\brefuse\b) 开始正向预查
    • .* 匹配零个或多个非行终止符的字符
    • \brefuse\b 匹配被单词边界包围的文字 "refuse"
  • ) 结束负向预查
  • I require\b 匹配单词 "I require" 后面跟随的单词边界
  • [\w\s]* 匹配零个或多个单词字符或空格字符
  • \bto disclose the information. 匹配被单词边界包围的文字 "to disclose the information."
  • $ 匹配字符串的末尾
英文:

You can use

r'^(?!.*\brefuse\b)I require\b[\w\s]*\bto disclose the information\.$'

Demo


This regular expression can be broken down as follows.

^                                Match beginning of string
(?!                              Begin positive lookahead
  .*                             Match zero or more characters other
                                 than line terminators
  \brefuse\b                     Match literal surriounded by word boundaries
)                                End negative lookahead
I require\b                      Match literal followed by word boundary
[\w\s]*                          Match zero or more word characters or
                                 whitespace characters
\bto disclose the information\.  Match literal preceded by word boundary
$                                Match end of string

huangapple
  • 本文由 发表于 2023年3月7日 09:04:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/75657196.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定