英文:
Python Regex to match every words in sentence until a last word has underscore in it
问题
我试图找到一个正则表达式,可以匹配句子中的每个单词,直到最后一个单词中有下划线为止。
例如:
13wfe + 123dg Text ldf_dfdlj_dfldjf_dfs test 123
在这个例子中,我只想得到
13wfe + 123dg Text
我尝试过类似这样的东西,
^.*?(?=_)
但它返回的是
13wfe + 123dg Text ldf
您可以在这里找到正则表达式。请在这种情况下为我提供指导。
更新:使用@liginity提供的正则表达式,我能够找到子字符串,但在某些情况下仍然失败。
例如,在这个例子中:
13wfe + 123dg Tetest_xt ldf_dfdlj_dfldjf_dfs test 123
它应该能够找到这么多:
13wfe + 123dg Tetest_xt
但它找到了多个:
13wfe + 123dg 和 _xt
英文:
I am trying to find the regex which can match every word in sentence until a last word has an underscore in it.
For example:
13wfe + 123dg Text ldf_dfdlj_dfldjf_dfs test 123
In this example, I am looking to get only
13wfe + 123dg Text
I have tried using something along the line of these,
^.*?(?=_)
but it is returning this
13wfe + 123dg Text ldf
You can find the regex here. Kindly guide me in this scenario.
Update: using the regex provided by @liginity, I am able to find the substring, but in some cases it is still failing.
Such as in this example:
13wfe + 123dg Tetest_xt ldf_dfdlj_dfldjf_dfs test 123
It should be able to find on this much:
13wfe + 123dg Tetest_xt
But it is finding multiple:
13wfe + 123dg and _xt
答案1
得分: 1
如果您想匹配任何非空白字符作为一个"word",您可以使用\S+
^
字符串的开头.*
匹配整行\S
匹配一个非空白字符(?=
正向预查[^\S\n]+
匹配1个或更多不包括换行符的空白字符[^\s_]+_
匹配1个或更多非空白字符,但不包括_
,然后匹配_
)
关闭预查
注意 如果_
也可能出现在单词的开头,您可以使用[^\s_]*_
,其中*
表示重复零次或多次。
查看正则表达式演示。
匹配直到最后一个单词(其中一个单词仅由单词字符\w
组成),其中包含下划线(不在单词的开头或结尾),左侧和右侧有空白边界(?<!\S)
和(?!\S)
:
^
字符串的开头.*
匹配整行(?<!\S)
负向回顾,匹配不跟随非空白字符的位置\S+
匹配一个或多个非空白字符(?=
正向预查[^\S\n]+
匹配1个或更多不包括换行符的空白字符[^\W_]+_\w+
匹配1个或更多非非单词字符(不包括_
),然后匹配单词字符\w
(?!\S)
负向预查,匹配不跟随非空白字符的位置
查看另一个正则表达式演示。
英文:
If you want to match any non whitespace char as a "word" you can use \S+
^.*\S(?=[^\S\n]+[^\s_]+_)
^
Start of string.*
Match the whole line\S
Match a non whitespace char(?=
Positive lookahead[^\S\n]+
Match 1+ whitespace chars without newlines[^\s_]+_
Match 1+ non whitespace chars without_
and then match the_
)
Close the lookahead
Note that if the _
can also be at the beginning of the word, you can use [^\s_]*_
where *
repeats zero or more times.
See a regex demo.
Matching all until a last word (where a word consists only of word chars \w
) has an underscore in it (so not at the start or the end of the word) where (?<!\S)
and (?!\S)
are left and right hand whitespace boundaries:
^.*(?<!\S)\S+(?=[^\S\n]+[^\W_]+_\w+(?!\S))
See another regex demo.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论