英文:
Regex remove the rest of the words after finding the first number
问题
我想要移除所有单词,包括数字,当表达式找到第一个数字时:
输入:Hola 4.4.2 como estas
输出:Hola
我的表达式:
[^(0-9)]*$
演示链接:https://regexr.com/7dbl1
英文:
I want to remove all the words including the number when the expression finds the first number: example:
Input: Hola 4.4.2 como estas
Output: Hola
My expression
[^(0-9)]*$
DEMO: https://regexr.com/7dbl1
答案1
得分: 2
这个正则表达式:
^[^0-9]+
正则表达式的匹配如下:
节点 | 解释 |
---|---|
^ |
字符串的开头锚点 |
[^0-9]+ |
除了0 到9 之外的任何字符(1次或多次(匹配尽可能多的次数)) |
英文:
This Regex:
^[^0-9]+
The regular expression matches as follows:
Node | Explanation |
---|---|
^ |
the beginning of the string anchor |
[^0-9]+ |
any character except: 0 to '9' (1 or more times (matching the most amount possible)) |
答案2
得分: 2
以下是翻译好的部分:
你的正则表达式 [^(0-9)]*$
的意思是,“匹配字符类 [^(0-9)]
中的零个或多个 (*
) 字符,后面跟着字符串的结尾 ($
),其中字符类由除了 (^
) 字符 '('
、')'
和数字 '0'
到 '9'
之外的所有字符组成。从一开始就可以看到你想要字符类为 [^0-9]
,所以让我们将你的正则表达式更改为 [^0-9]*$
。
正则表达式引擎最初在你的字符串 'Hola 4.4.2 como estas'
中匹配到 'Hola '
,但然后发现匹配的最后一个字符(空格)不是字符串的最后一个字符,因此匹配失败。然后它进行了相当多的回溯<sup>1</sup>,直到得出结论,第一个 '4'
前面没有匹配。
'4'
不匹配,但 '.'
匹配,但不是字符串的结尾,所以过程继续,直到匹配到 ' como estas'
。这成功,因为该匹配的结尾位于字符串的末尾。
显然,如果要匹配 'Hola '
,你需要从你的正则表达式中删除锚点 $
:[^0-9]*
。
如果这样做,你的字符串中将有七个匹配项:
'Hola '
;'.'
(两次);- 每个数字前面的空字符串(三次);
' como estas'
。
你可能只对第一个匹配项感兴趣,这没问题。
如果你将你的正则表达式更改为 [^0-9]+
,则不再匹配数字前面的空字符串。
如果你不希望匹配到 'Hola '
末尾的空格,可以在你的正则表达式中添加一个 负回顾后发断言:
[^0-9]*(?<! )
(?<! )
的意思是,“之前匹配的字符不能是空格”。
当然,这需要正则表达式引擎支持回顾后发断言。
<sup>1 通过滚动到左侧列底部的 此处,并选择“正则表达式调试器”,可以查看正则表达式引擎为你的正则表达式执行的步骤的模拟</sup>
英文:
Your regular expression
[^(0-9)]*$
reads, "match zero or more (*
) characters in the character class [^(0-9)]
followed by the end of the string ($
), where the character class is comprised of all characters other than (^
) the characters '('
, ')'
and the digits '0'
to '9'
. Right away you see you want the character class to be [^0-9]
, so let's change your regex to
[^0-9]*$
The regex engine initially matches 'Hola '
in your string
'Hola 4.4.2 como estas'
but then finds that the last character in the match (the space) is not the last character in the string, so the match fails. It then does considerable backtracking<sup>1</sup> before it concludes that no match begins before the first '4'
.
'4'
does not match but '.'
does, but it is not at the end of the string so the process continues until ' como estas'
is matched. That succeeds because the end of that match is at the end of the string.
Clearly, to match 'Hola '
, you want to remove the anchor $
from your regex: [^0-9]*
.
If you do that there will be seven matches in your string:
'Hola '
;'.'
(twice);- the empty string before each digit (thrice); and
' como estas'
.
Presumably you are only interested in the first match, which is fine.
If you were to change your regular expression to
[^0-9]+
empty strings before each digit would no longer match.
If you did not wish to match the space at the end of 'Hola '
you could append a negative lookbehind to your regular expression:
[^0-9]*(?<! )
(?<! )
reads, "the previous character matched may not be a space".
This of course requires that the regex engine supports lookbehinds.
<sup>1 See a simulation of the steps performed by the regex engine for your regex by scrolling to the bottom of the left column here and selecting "Regex Debugger"</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论