正则表达式在找到第一个数字后移除其后的所有单词。

huangapple go评论53阅读模式
英文:

Regex remove the rest of the words after finding the first number

问题

我想要移除所有单词,包括数字,当表达式找到第一个数字时:

输入:Hola 4.4.2 como estas

输出:Hola

我的表达式:

[^(0-9)]*$

演示链接:https://regexr.com/7dbl1

英文:

I want to remove all the words including the number when the expression finds the first number: example:

Input: Hola 4.4.2 como estas

Output: Hola

My expression

[^(0-9)]*$

DEMO: https://regexr.com/7dbl1

答案1

得分: 2

这个正则表达式:

^[^0-9]+

在线演示

正则表达式的匹配如下:

节点 解释
^ 字符串的开头锚点
[^0-9]+ 除了09之外的任何字符(1次或多次(匹配尽可能多的次数))
英文:

This Regex:

^[^0-9]+

Online Demo

The regular expression matches as follows:

Node Explanation
^ the beginning of the string anchor
[^0-9]+ any character except: 0 to '9' (1 or more times (matching the most amount possible))

答案2

得分: 2

以下是翻译好的部分:

你的正则表达式 [^(0-9)]*$ 的意思是,“匹配字符类 [^(0-9)] 中的零个或多个 (*) 字符,后面跟着字符串的结尾 ($),其中字符类由除了 (^) 字符 '('')' 和数字 '0''9' 之外的所有字符组成。从一开始就可以看到你想要字符类为 [^0-9],所以让我们将你的正则表达式更改为 [^0-9]*$

正则表达式引擎最初在你的字符串 &#39;Hola 4.4.2 como estas&#39; 中匹配到 &#39;Hola &#39;,但然后发现匹配的最后一个字符(空格)不是字符串的最后一个字符,因此匹配失败。然后它进行了相当多的回溯<sup>1</sup>,直到得出结论,第一个 &#39;4&#39; 前面没有匹配。

&#39;4&#39; 不匹配,但 &#39;.&#39; 匹配,但不是字符串的结尾,所以过程继续,直到匹配到 &#39; como estas&#39;。这成功,因为该匹配的结尾位于字符串的末尾。


显然,如果要匹配 &#39;Hola &#39;,你需要从你的正则表达式中删除锚点 $[^0-9]*

如果这样做,你的字符串中将有七个匹配项:

  • &#39;Hola &#39;
  • &#39;.&#39;(两次);
  • 每个数字前面的空字符串(三次);
  • &#39; como estas&#39;

你可能只对第一个匹配项感兴趣,这没问题。

演示


如果你将你的正则表达式更改为 [^0-9]+,则不再匹配数字前面的空字符串。


如果你不希望匹配到 &#39;Hola &#39; 末尾的空格,可以在你的正则表达式中添加一个 负回顾后发断言

[^0-9]*(?&lt;! )

(?&lt;! ) 的意思是,“之前匹配的字符不能是空格”。

当然,这需要正则表达式引擎支持回顾后发断言。

演示

<sup>1 通过滚动到左侧列底部的 此处,并选择“正则表达式调试器”,可以查看正则表达式引擎为你的正则表达式执行的步骤的模拟</sup>

英文:

Your regular expression

[^(0-9)]*$

reads, "match zero or more (*) characters in the character class [^(0-9)] followed by the end of the string ($), where the character class is comprised of all characters other than (^) the characters &#39;(&#39;, &#39;)&#39; and the digits &#39;0&#39; to &#39;9&#39;. Right away you see you want the character class to be [^0-9], so let's change your regex to

[^0-9]*$

The regex engine initially matches &#39;Hola &#39; in your string

&#39;Hola 4.4.2 como estas&#39;

but then finds that the last character in the match (the space) is not the last character in the string, so the match fails. It then does considerable backtracking<sup>1</sup> before it concludes that no match begins before the first &#39;4&#39;.

&#39;4&#39; does not match but &#39;.&#39; does, but it is not at the end of the string so the process continues until &#39; como estas&#39; is matched. That succeeds because the end of that match is at the end of the string.


Clearly, to match &#39;Hola &#39;, you want to remove the anchor $ from your regex: [^0-9]*.

If you do that there will be seven matches in your string:

  • &#39;Hola &#39;;
  • &#39;.&#39; (twice);
  • the empty string before each digit (thrice); and
  • &#39; como estas&#39;.

Presumably you are only interested in the first match, which is fine.

Demo


If you were to change your regular expression to

[^0-9]+

empty strings before each digit would no longer match.


If you did not wish to match the space at the end of &#39;Hola &#39; you could append a negative lookbehind to your regular expression:

[^0-9]*(?&lt;! )

(?&lt;! ) reads, "the previous character matched may not be a space".

This of course requires that the regex engine supports lookbehinds.

Demo

<sup>1 See a simulation of the steps performed by the regex engine for your regex by scrolling to the bottom of the left column here and selecting "Regex Debugger"</sup>

huangapple
  • 本文由 发表于 2023年5月7日 03:59:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76190860.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定