英文:
split with same pattern structure (!)|(!\?) and (!)|(e!) But behave differently in python regex
问题
The sentence I want to split is this.
sentence = "Break!me!?haha"
And the first pattern is
pattern = r'(!)|(!\?)'
The result with these code are
print(re.split(pattern, sentence))
print([word for word in re.split(pattern, sentence) if word != None and not word == ""])
['Break', '!', None, 'me', '!', None, '?haha']
['Break', '!', 'me', '!', '?haha']
The !
and ?
are used as delimiters, not !
and !?
.
But when I use this pattern
pattern = r'(!)|(e!)'
the result with the same above codes are
['Break', '!', None, 'm', None, 'e!', '?haha']
['Break', '!', 'm', 'e!', '?haha']
This time delimiters were !
and e!
But I think those patterns are using the same structure.
pattern = r'(!)|(!\?)'
pattern = r'(!)|(e!)'
Here is the working code
英文:
The sentence I want to split is this.
sentence = "Break!me!?haha"
And the first pattern is
pattern = r'(!)|(!\?)'
The result with these code are
print(re.split(pattern,sentence))
print([word for word in re.split(pattern, sentence) if word != None and not word == ""])
['Break', '!', None, 'me', '!', None, '?haha']
['Break', '!', 'me', '!', '?haha']
The !
and ?
are used as delimiters. not !
and !?
.
But when I use this pattern
pattern = r'(!)|(e!)'
the result with same above codes are
['Break', '!', None, 'm', None, 'e!', '?haha']
['Break', '!', 'm', 'e!', '?haha']
This time delimiters were !
and e!
But I think those patterns are using same structure.
pattern = r'(!)|(!\?)'
pattern = r'(!)|(e!)'
Here is working code
答案1
得分: 1
以下是翻译好的部分:
模式的结构不同 - 在第一个模式中,两个替代项共享相同的起始字符,因此第一个替代项((!)
)会首先匹配,而在第二个模式中,它们不共享起始字符,因此第二个替代项会首先匹配(因为e!
可以在!
之前匹配)。
如果你希望(!\?)
优先于(!)
,你需要将它放在第一位:
pattern = r'(!\?)|(!)';
(尽管你也可以简单地使用:
pattern = r'(!\??)';
因为?
是贪婪的)。
英文:
The patterns are differently structured - in the first pattern, the two alternatives share the same starting character, so the first alternative ((!)
) will match first, whereas in the second pattern, they don't, so the second alternative matches first (because e!
can be matched before !
).
If you want (!\?)
to take precedence over (!)
, you need to place it first:
pattern = r'(!\?)|(!)'
(although you could simply use
pattern = r'(!\??)'
instead because the ?
is greedy).
答案2
得分: 0
For the pattern pattern = r'(!)|(!\?)'
, 正则表达式在处理模式时会尝试匹配(!\?)
,但只有在不匹配(!)
时才会这样做。很显然,如果不匹配(!)
,就永远不会匹配(!\?)
。
For the second pattern pattern = r'(!)|(e!)'
, 你可能会无法匹配(!)
,但会成功匹配(e!)
,这正是发生的情况。
英文:
For the pattern pattern = r'(!)|(!\?)'
, the way regex processes the pattern makes it so that it will only attempt to match (!\?)
if it does not match (!)
. Obviously, if it does not match (!)
it will never match (!\?)
. Flipping the statement around:
sentence = "Break!me!?haha"
pattern = r'(!\?)|(!)'
import re
print(re.split(pattern,sentence))
print([word for word in re.split(pattern, sentence) if word != None and not word == ""])
#output
['Break', None, '!', 'me', '!?', None, 'haha']
['Break', '!', 'me', '!?', 'haha']
For the second pattern pattern = r'(!)|(e!)'
, you could possibly fail to match (!)
but do match (e!)
, which is exactly what occurs.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论