2023年6月13日 14:13:14go评论97阅读模式

英文:

split with same pattern structure (!)|(!\?) and (!)|(e!) But behave differently in python regex

问题

The sentence I want to split is this.

sentence = "Break!me!?haha"

And the first pattern is

pattern = r'(!)|(!\?)'

The result with these code are

print(re.split(pattern, sentence))
print([word for word in re.split(pattern, sentence) if word != None and not word == ""])

['Break', '!', None, 'me', '!', None, '?haha']
['Break', '!', 'me', '!', '?haha']

The ! and ? are used as delimiters, not ! and !?.

But when I use this pattern

pattern = r'(!)|(e!)'

the result with the same above codes are

['Break', '!', None, 'm', None, 'e!', '?haha']
['Break', '!', 'm', 'e!', '?haha']

This time delimiters were ! and e!

But I think those patterns are using the same structure.

pattern = r'(!)|(!\?)'
pattern = r'(!)|(e!)'

Here is the working code

Link to working code

英文:

The sentence I want to split is this.

sentence = "Break!me!?haha"

And the first pattern is

pattern = r'(!)|(!\?)'

The result with these code are

print(re.split(pattern,sentence))
print([word for word in re.split(pattern, sentence) if word != None and not word == &quot;&quot;])

[&#39;Break&#39;, &#39;!&#39;, None, &#39;me&#39;, &#39;!&#39;, None, &#39;?haha&#39;]
[&#39;Break&#39;, &#39;!&#39;, &#39;me&#39;, &#39;!&#39;, &#39;?haha&#39;]

The ! and ? are used as delimiters. not ! and !?.

But when I use this pattern

pattern = r'(!)|(e!)'

the result with same above codes are

[&#39;Break&#39;, &#39;!&#39;, None, &#39;m&#39;, None, &#39;e!&#39;, &#39;?haha&#39;]
[&#39;Break&#39;, &#39;!&#39;, &#39;m&#39;, &#39;e!&#39;, &#39;?haha&#39;]

This time delimiters were ! and e!

But I think those patterns are using same structure.

pattern = r&#39;(!)|(!\?)&#39;
pattern = r&#39;(!)|(e!)&#39;

Here is working code

https://www.online-python.com/UrgFEQVCvR

答案1

得分: 1

以下是翻译好的部分：

模式的结构不同 - 在第一个模式中，两个替代项共享相同的起始字符，因此第一个替代项((!))会首先匹配，而在第二个模式中，它们不共享起始字符，因此第二个替代项会首先匹配（因为e!可以在!之前匹配）。

如果你希望(!\?)优先于(!)，你需要将它放在第一位：

pattern = r'(!\?)|(!)';

（尽管你也可以简单地使用：

pattern = r'(!\??)';

因为?是贪婪的）。

英文:

The patterns are differently structured - in the first pattern, the two alternatives share the same starting character, so the first alternative ((!)) will match first, whereas in the second pattern, they don't, so the second alternative matches first (because e! can be matched before !).

If you want (!\?) to take precedence over (!), you need to place it first:

pattern = r&#39;(!\?)|(!)&#39;

(although you could simply use

pattern = r&#39;(!\??)&#39;

instead because the ? is greedy).

答案2

得分: 0

For the pattern pattern = r'(!)|(!\?)', 正则表达式在处理模式时会尝试匹配(!\?)，但只有在不匹配(!)时才会这样做。很显然，如果不匹配(!)，就永远不会匹配(!\?)。

For the second pattern pattern = r'(!)|(e!)', 你可能会无法匹配(!)，但会成功匹配(e!)，这正是发生的情况。

英文:

For the pattern pattern = r'(!)|(!\?)', the way regex processes the pattern makes it so that it will only attempt to match (!\?) if it does not match (!). Obviously, if it does not match (!) it will never match (!\?). Flipping the statement around:

sentence = &quot;Break!me!?haha&quot;
pattern = r&#39;(!\?)|(!)&#39;
import re
print(re.split(pattern,sentence))
print([word for word in re.split(pattern, sentence) if word != None and not word == &quot;&quot;])
#output
[&#39;Break&#39;, None, &#39;!&#39;, &#39;me&#39;, &#39;!?&#39;, None, &#39;haha&#39;]
[&#39;Break&#39;, &#39;!&#39;, &#39;me&#39;, &#39;!?&#39;, &#39;haha&#39;]

For the second pattern pattern = r'(!)|(e!)', you could possibly fail to match (!) but do match (e!), which is exactly what occurs.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

split with same pattern structure (!)|(!\?) and (!)|(e!) But behave differently in python regex

问题

答案1

答案2

在Python中使用re.finditer方法提取匹配的片段如何实现？

保持Dash中下拉菜单的虚拟尺寸不变 – Python

如何以这种方式拆分字符串？

secrets.compare_digest函数中发生碰撞的机会有多大？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。