Python Regex to match every words in sentence until a last word has hyphen in it, not working

huangapple go评论72阅读模式
英文:

Python Regex to match every words in sentence until a last word has hyphen in it, not working

问题

我有这个正则表达式来查找句子中直到最后一个带连字符的单词。

这是我的输入字符串:

13wfe + 123dg Tetest-xt ldf-dfdlj-dfldjf-dfs test 123

到目前为止,使用这个帖子中的正则表达式,我得到的匹配如下:

13wfe + 123dg Tetest-xt ldf-dfdlj-

但我期望的输出应该只是这样:

13wfe + 123dg Tetest-xt

这是我正在使用的正则表达式:(.*\b)(?=\w+-)

我不想要最后一个带连字符的单词。请在这种情况下指导我。

英文:

I have this regex to find the words in a sentence until the last word which has hyphen in it.

This is my input string:

 13wfe + 123dg Tetest-xt ldf-dfdlj-dfldjf-dfs test 123

And so far using this regex, from this post, I am getting match like this:

 13wfe + 123dg Tetest-xt ldf-dfdlj-

But my expected output should be only this:

 13wfe + 123dg Tetest-xt

And this is the regex (.*\b)(?=\w+-) I am using.

I do not want the last word which has hyphen in it. Kindly guide me in this scenario.

答案1

得分: 1

只需添加一个空格,与最后一个单词分开(.*)(?=\s\w+-),并且单词边界是无用的

> regex101

英文:

Just add a space, separate from the last word (.*)(?=\s\w+-) and the word boundary is useless

> regex101

答案2

得分: 1

可以使用以下模式。如果文本不包含连字符单词,则捕获整个文本。

```python
strings = ['abc 123 def 456 ghi 789 jkl',
           'abc 123 def 456 ghi 789-jkl',
           'abc 123 def 456-ghi 789-jkl',
           'abc 123 def-456 ghi-789-jkl',
           'abc-123 def-456 ghi-789-jkl']
pattern = re.compile(r'(.+?\w+(?:-\w+)+|.+)(?: +.+)?')
for string in strings:
    print(pattern.match(string).group(1))

Output

abc 123 def 456 ghi 789 jkl
abc 123 def 456 ghi 789-jkl
abc 123 def 456-ghi
abc 123 def-456
abc-123

<details>
<summary>英文:</summary>

You can use the following pattern.  
If the text does not contain a hyphenated word, the entire text is captured.

```none
(.+?\w+(?:-\w+)+|.+)(?: +.+)?
strings = [&#39;abc 123 def 456 ghi 789 jkl&#39;,
           &#39;abc 123 def 456 ghi 789-jkl&#39;,
           &#39;abc 123 def 456-ghi 789-jkl&#39;,
           &#39;abc 123 def-456 ghi-789-jkl&#39;,
           &#39;abc-123 def-456 ghi-789-jkl&#39;]
pattern = re.compile(r&#39;(.+?\w+(?:-\w+)+|.+)(?: +.+)?&#39;)
for string in strings:
    print(pattern.match(string).group(1))

Output

abc 123 def 456 ghi 789 jkl
abc 123 def 456 ghi 789-jkl
abc 123 def 456-ghi
abc 123 def-456
abc-123

huangapple
  • 本文由 发表于 2023年7月3日 22:45:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76605838.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定