split with same pattern structure (!)|(!\?) and (!)|(e!) But behave differently in python regex

huangapple go评论55阅读模式
英文:

split with same pattern structure (!)|(!\?) and (!)|(e!) But behave differently in python regex

问题

The sentence I want to split is this.

sentence = "Break!me!?haha"

And the first pattern is

pattern = r'(!)|(!\?)'

The result with these code are

print(re.split(pattern, sentence))
print([word for word in re.split(pattern, sentence) if word != None and not word == ""])
['Break', '!', None, 'me', '!', None, '?haha']
['Break', '!', 'me', '!', '?haha']

The ! and ? are used as delimiters, not ! and !?.

But when I use this pattern

pattern = r'(!)|(e!)'

the result with the same above codes are

['Break', '!', None, 'm', None, 'e!', '?haha']
['Break', '!', 'm', 'e!', '?haha']

This time delimiters were ! and e!

But I think those patterns are using the same structure.

pattern = r'(!)|(!\?)'
pattern = r'(!)|(e!)'

Here is the working code

Link to working code

英文:

The sentence I want to split is this.

sentence = "Break!me!?haha"

And the first pattern is

pattern = r'(!)|(!\?)'

The result with these code are

print(re.split(pattern,sentence))
print([word for word in re.split(pattern, sentence) if word != None and not word == ""])
['Break', '!', None, 'me', '!', None, '?haha']
['Break', '!', 'me', '!', '?haha']

The ! and ? are used as delimiters. not ! and !?.

But when I use this pattern

pattern = r'(!)|(e!)'

the result with same above codes are

['Break', '!', None, 'm', None, 'e!', '?haha']
['Break', '!', 'm', 'e!', '?haha']

This time delimiters were ! and e!

But I think those patterns are using same structure.

pattern = r'(!)|(!\?)'
pattern = r'(!)|(e!)'

Here is working code

https://www.online-python.com/UrgFEQVCvR

答案1

得分: 1

以下是翻译好的部分:

模式的结构不同 - 在第一个模式中,两个替代项共享相同的起始字符,因此第一个替代项((!))会首先匹配,而在第二个模式中,它们不共享起始字符,因此第二个替代项会首先匹配(因为e!可以在!之前匹配)。

如果你希望(!\?)优先于(!),你需要将它放在第一位:

pattern = r'(!\?)|(!)';

(尽管你也可以简单地使用:

pattern = r'(!\??)';

因为?是贪婪的)。

英文:

The patterns are differently structured - in the first pattern, the two alternatives share the same starting character, so the first alternative ((!)) will match first, whereas in the second pattern, they don't, so the second alternative matches first (because e! can be matched before !).

If you want (!\?) to take precedence over (!), you need to place it first:

pattern = r'(!\?)|(!)'

(although you could simply use

pattern = r'(!\??)'

instead because the ? is greedy).

答案2

得分: 0

For the pattern pattern = r'(!)|(!\?)', 正则表达式在处理模式时会尝试匹配(!\?),但只有在不匹配(!)时才会这样做。很显然,如果不匹配(!),就永远不会匹配(!\?)

For the second pattern pattern = r'(!)|(e!)', 你可能会无法匹配(!),但会成功匹配(e!),这正是发生的情况。

英文:

For the pattern pattern = r'(!)|(!\?)', the way regex processes the pattern makes it so that it will only attempt to match (!\?) if it does not match (!). Obviously, if it does not match (!) it will never match (!\?). Flipping the statement around:

sentence = "Break!me!?haha"
pattern = r'(!\?)|(!)'

import re

print(re.split(pattern,sentence))
print([word for word in re.split(pattern, sentence) if word != None and not word == ""])

#output
['Break', None, '!', 'me', '!?', None, 'haha']
['Break', '!', 'me', '!?', 'haha']

For the second pattern pattern = r'(!)|(e!)', you could possibly fail to match (!) but do match (e!), which is exactly what occurs.

huangapple
  • 本文由 发表于 2023年6月13日 14:13:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76462106.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定