英文:
Regex to check if string contains anything other than allowed words
问题
I understand your request. Here's the translated content without code:
我想检查一个字符串是否包含除了一些预定义的词汇之外的任何词汇。预定义的词汇是 What is,plus,minus,multiplied by,divided by
,其中某些短语中包含单个空格。我已经阅读了 这篇帖子 和 这篇帖子,都使用了负向先行断言,但没有找到一个有效的模式。
例如,输入文本 "What is plus abc divided by" 应该返回 "abc" 未被识别。
对于这个问题,正确的正则表达式是什么?
编辑:
请注意,我不关心无效令牌是什么,只关心它是否存在。它可以是任何东西,一个单词或一个数字。这个问题也可以理解为“检查输入是否只包含允许的词汇”。
英文:
I would like to check if a string contains any word other than some predefined ones. The predefined words are What is,plus,minus,multiplied by,divided by
, single whitespace included in some of the phrases. I've read this post and this one, both using negative lookaheads, but couldn't come up with a pattern that worked.
For example, input text "What is plus abc divided by" should come back as "abc" not recognized.
What would be a correct regex for this?
Edit:
Note that I don't care about what the invalid token is, just that it exists. It can be anything, a word or a number. The question can also be thought as "check if the input contains only allowed words".
答案1
得分: 1
这是你要翻译的内容:
只需将它们组合在一起:
(?:What is|plus|minus|multiplied by|divided by)
请注意,如果你有,例如,multiply
和 multiply by
(即以另一个标记开头的一个标记),multiply by
必须 先出现:
(?:What is|plus|minus|multiply by|multiply)
要检查字符串是否仅包含有效的标记,请使用:
^ # 在字符串开头匹配
\g<token> # 一个预定义的标记
(?:\s+\g<token>)* # 后跟0个或多个标记
$ # 紧靠字符串末尾。
...其中 \g<token>
表示上面的表达式。
原始答案:
由于我们还需要找到(第一个)无效标记,您需要匹配每个非空格的字符串,并将那些不与上面的表达式匹配的字符串存储在一个组中:
(?:What is|plus|minus|multiplied by|divided by)|(\S+)
如果匹配包含第1组,那意味着它是一个无法识别的标记。相应地输出错误。
英文:
Simply join them up in a group:
(?:What is|plus|minus|multiplied by|divided by)
Note that if you have, for example, multiply
and multiply by
(i.e. one token that starts with another), multiply by
must comes first:
(?:What is|plus|minus|multiply by|multiply)
To check if the string only contains valid tokens, use:
^ # Match at the start of string
\g<token> # a pre-defined token
(?:\s+\g<token>)* # followed by 0 or more tokens
$ # right before the end of string.
...where \g<token>
denotes the expression above.
Try it on regex101.com.
Original answer
Since we also need to find the (first) invalid token, you need to match every non-whitespace streaks and store those which are not matched by the expression above in a group:
(?:What is|plus|minus|multiplied by|divided by)|(\S+)
If the match contains group 1, that means it is a non-recognized token. Output an error accordingly.
Try it on regex101.com.
答案2
得分: 0
"... 检查输入是否只包含允许的单词。"
然后,您需要检查结果,看看非指定值是否被允许。
What is +(.+?) +(?:plus|minus) +(.+?) +(?:(?:multiplied|divided) by) +(.+)
或者,指定这些值。在这种情况下,很可能只是数字。
What is +(\d+) +(?:plus|minus) +(\d+) +(?:(?:multiplied|divided) by) +(\d+)
示例
What is 1 plus 2 divided by 3
输出将是1、2和3。
最终,允许分数值。
What is +(\d+(?:\.\d+)?) +(?:plus|minus) +(\d+(?:\.\d+)?) +(?:(?:multiplied|divided) by) +(\d+(?:\.\d+)?)
What is 1.23 plus 2.3 divided by 3
英文:
> "... check if the input contains only allowed words".
You would have to then check the result to see if the non-specified value is allowed.
What is +(.+?) +(?:plus|minus) +(.+?) +(?:(?:multiplied|divided) by) +(.+)
Alternately, specify the values. In this case it's most likely numbers only.
What is +(\d+) +(?:plus|minus) +(\d+) +(?:(?:multiplied|divided) by) +(\d+)
Example
What is 1 plus 2 divided by 3
The output would be 1, 2, and 3.
And, ultimately allow for fractional values.
What is +(\d+(?:\.\d+)?) +(?:plus|minus) +(\d+(?:\.\d+)?) +(?:(?:multiplied|divided) by) +(\d+(?:\.\d+)?)
What is 1.23 plus 2.3 divided by 3
答案3
得分: -1
Use a negative look ahead to try to match the whole input being not being made of just the allowed phrases:
^(?!((^| )(What is|plus|minus|multiplied by|divided by)( |$))+$).*
See live demo.
英文:
Use a negative look ahead to try to match the whole input being not being made of just the allowed phrases:
^(?!((^| )(What is|plus|minus|multiplied by|divided by)( |$))+$).*
See live demo.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论