2023年7月3日 19:49:51go评论99阅读模式

英文:

Python Regex to match every words in sentence until a last word has underscore in it

问题

我试图找到一个正则表达式，可以匹配句子中的每个单词，直到最后一个单词中有下划线为止。

例如：

13wfe + 123dg Text ldf_dfdlj_dfldjf_dfs test 123

在这个例子中，我只想得到

13wfe + 123dg Text

我尝试过类似这样的东西，

^.*?(?=_)

但它返回的是

13wfe + 123dg Text ldf

您可以在这里找到正则表达式。请在这种情况下为我提供指导。

更新：使用@liginity提供的正则表达式，我能够找到子字符串，但在某些情况下仍然失败。

例如，在这个例子中：

13wfe + 123dg Tetest_xt ldf_dfdlj_dfldjf_dfs test 123

它应该能够找到这么多：

13wfe + 123dg Tetest_xt

但它找到了多个：

13wfe + 123dg 和 _xt

英文:

I am trying to find the regex which can match every word in sentence until a last word has an underscore in it.

For example:

13wfe + 123dg Text ldf_dfdlj_dfldjf_dfs test 123

In this example, I am looking to get only

13wfe + 123dg Text

I have tried using something along the line of these,

^.*?(?=_)

but it is returning this

13wfe + 123dg Text ldf

You can find the regex here. Kindly guide me in this scenario.

Update: using the regex provided by @liginity, I am able to find the substring, but in some cases it is still failing.

Such as in this example:

 13wfe + 123dg Tetest_xt ldf_dfdlj_dfldjf_dfs test 123

It should be able to find on this much:

13wfe + 123dg Tetest_xt

But it is finding multiple:

13wfe + 123dg and _xt

答案1

得分: 1

如果您想匹配任何非空白字符作为一个"word"，您可以使用\S+

^ 字符串的开头
.* 匹配整行
\S 匹配一个非空白字符
(?= 正向预查
- [^\S\n]+ 匹配1个或更多不包括换行符的空白字符
- [^\s_]+_ 匹配1个或更多非空白字符，但不包括 _，然后匹配 _
) 关闭预查

注意如果_也可能出现在单词的开头，您可以使用[^\s_]*_，其中*表示重复零次或多次。

查看正则表达式演示。

匹配直到最后一个单词（其中一个单词仅由单词字符\w组成），其中包含下划线（不在单词的开头或结尾），左侧和右侧有空白边界(?<!\S)和(?!\S)：

^ 字符串的开头
.* 匹配整行
(?<!\S) 负向回顾，匹配不跟随非空白字符的位置
\S+ 匹配一个或多个非空白字符
(?= 正向预查
- [^\S\n]+ 匹配1个或更多不包括换行符的空白字符
- [^\W_]+_\w+ 匹配1个或更多非非单词字符（不包括 _），然后匹配单词字符\w
- (?!\S) 负向预查，匹配不跟随非空白字符的位置

查看另一个正则表达式演示。

英文:

If you want to match any non whitespace char as a "word" you can use \S+

^.*\S(?=[^\S\n]+[^\s_]+_)

^ Start of string
.* Match the whole line
\S Match a non whitespace char
(?= Positive lookahead
- [^\S\n]+ Match 1+ whitespace chars without newlines
- [^\s_]+_ Match 1+ non whitespace chars without _ and then match the _
) Close the lookahead

Note that if the _ can also be at the beginning of the word, you can use [^\s_]*_ where * repeats zero or more times.

See a regex demo.

Matching all until a last word (where a word consists only of word chars \w) has an underscore in it (so not at the start or the end of the word) where (?<!\S) and (?!\S) are left and right hand whitespace boundaries:

^.*(?&lt;!\S)\S+(?=[^\S\n]+[^\W_]+_\w+(?!\S))

See another regex demo.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python Regex to match every words in sentence until a last word has underscore in it

问题

答案1

如何在Snowflake中安装不属于Anaconda的Python包。

Python – 将JSON列表转换为数据框

如何将这段代码更改为 Polars？” TypeError: ‘GroupBy’ 对象不可订阅”

可以不用每次都创建新的 Mobject，而是可以交替使用 MathTex() 和 Text() 吗？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。