英文:
split with same pattern structure (!)|(!\?) and (!)|(e!) But behave differently in python regex
问题
The sentence I want to split is this.
sentence = "Break!me!?haha"
And the first pattern is
pattern = r'(!)|(!\?)'
The result with these code are
print(re.split(pattern, sentence))
print([word for word in re.split(pattern, sentence) if word != None and not word == ""])
['Break', '!', None, 'me', '!', None, '?haha']
['Break', '!', 'me', '!', '?haha']
The ! and ? are used as delimiters, not ! and !?.
But when I use this pattern
pattern = r'(!)|(e!)'
the result with the same above codes are
['Break', '!', None, 'm', None, 'e!', '?haha']
['Break', '!', 'm', 'e!', '?haha']
This time delimiters were ! and e!
But I think those patterns are using the same structure.
pattern = r'(!)|(!\?)'
pattern = r'(!)|(e!)'
Here is the working code
英文:
The sentence I want to split is this.
sentence = "Break!me!?haha"
And the first pattern is
pattern = r'(!)|(!\?)'
The result with these code are
print(re.split(pattern,sentence))
print([word for word in re.split(pattern, sentence) if word != None and not word == ""])
['Break', '!', None, 'me', '!', None, '?haha']
['Break', '!', 'me', '!', '?haha']
The ! and ? are used as delimiters. not ! and !?.
But when I use this pattern
pattern = r'(!)|(e!)'
the result with same above codes are
['Break', '!', None, 'm', None, 'e!', '?haha']
['Break', '!', 'm', 'e!', '?haha']
This time delimiters were ! and e!
But I think those patterns are using same structure.
pattern = r'(!)|(!\?)'
pattern = r'(!)|(e!)'
Here is working code
答案1
得分: 1
以下是翻译好的部分:
模式的结构不同 - 在第一个模式中,两个替代项共享相同的起始字符,因此第一个替代项((!))会首先匹配,而在第二个模式中,它们不共享起始字符,因此第二个替代项会首先匹配(因为e!可以在!之前匹配)。
如果你希望(!\?)优先于(!),你需要将它放在第一位:
pattern = r'(!\?)|(!)';
(尽管你也可以简单地使用:
pattern = r'(!\??)';
因为?是贪婪的)。
英文:
The patterns are differently structured - in the first pattern, the two alternatives share the same starting character, so the first alternative ((!)) will match first, whereas in the second pattern, they don't, so the second alternative matches first (because e! can be matched before !).
If you want (!\?) to take precedence over (!), you need to place it first:
pattern = r'(!\?)|(!)'
(although you could simply use
pattern = r'(!\??)'
instead because the ? is greedy).
答案2
得分: 0
For the pattern pattern = r'(!)|(!\?)', 正则表达式在处理模式时会尝试匹配(!\?),但只有在不匹配(!)时才会这样做。很显然,如果不匹配(!),就永远不会匹配(!\?)。
For the second pattern pattern = r'(!)|(e!)', 你可能会无法匹配(!),但会成功匹配(e!),这正是发生的情况。
英文:
For the pattern pattern = r'(!)|(!\?)', the way regex processes the pattern makes it so that it will only attempt to match (!\?) if it does not match (!). Obviously, if it does not match (!) it will never match (!\?). Flipping the statement around:
sentence = "Break!me!?haha"
pattern = r'(!\?)|(!)'
import re
print(re.split(pattern,sentence))
print([word for word in re.split(pattern, sentence) if word != None and not word == ""])
#output
['Break', None, '!', 'me', '!?', None, 'haha']
['Break', '!', 'me', '!?', 'haha']
For the second pattern pattern = r'(!)|(e!)', you could possibly fail to match (!) but do match (e!), which is exactly what occurs.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论