英文:
re.sub a list of words, ignore case
问题
我尝试将HTML <b>
元素添加到句子中一组单词中。经过一些搜索,我几乎搞定了,除了忽略大小写。
import re
bolds = ['test', 'tested'] # 我想要加粗这些单词,不区分大小写
text = "Test lorem tested ipsum dolor sit amet test, consectetur TEST adipiscing elit test."
pattern = r'\b(?:' + "|".join(bolds) + r')\b'
dict_repl = {k: f'<b>{k}</b>' for k in bolds}
text_bolded = re.sub(pattern, lambda m: dict_repl.get(m.group().lower(), m.group()), text, flags=re.IGNORECASE)
print(text_bolded)
输出:
<b>Test</b> lorem <b>tested</b> ipsum dolor sit amet <b>test</b>, consectetur <b>TEST</b> adipiscing elit <b>test</b>.
这个输出包含了 <b>
元素,以适应 Test
和 TEST
。希望这对你有帮助。
英文:
I am trying to add the html <b>
element to a list of words in a sentence. After doing some search I got it almost working, except the ignore-case.
import re
bolds = ['test', 'tested'] # I want to bold these words, ignoring-case
text = "Test lorem tested ipsum dolor sit amet test, consectetur TEST adipiscing elit test."
pattern = r'\b(?:' + "|".join(bolds) + r')\b'
dict_repl = {k: f'<b>{k}</b>' for k in bolds}
text_bolded = re.sub(pattern, lambda m: dict_repl.get(m.group(), m.group()), text)
print(text_bolded)
Output:
Test lorem <b>tested</b> ipsum dolor sit amet <b>test</b>, consectetur TEST adipiscing elit <b>test</b>.
This output misses the <b>
element for Test
and TEST
. In other words, I would like the output to be:
<b>Test</b> lorem <b>tested</b> ipsum dolor sit amet <b>test</b>, consectetur <b>TEST</b> adipiscing elit <b>test</b>.
One hack is that I explicitly add the capitalize
and upper
, like so ...
bolds = bolds + [b.capitalize() for b in bolds] + [b.upper() for b in bolds]
But I am thinking there must be a better way to do this. Besides, the above hack will miss words like tesT
, etc.
Thank you!
答案1
得分: 2
不需要字典或函数。所有的替换都是简单的字符串包裹在原始字符串周围,您可以通过后向引用来获得它。
使用 flags=re.I
使匹配不区分大小写。
text_bolded = re.sub(pattern, r'<b>\g<0></b>', text, flags=re.I)
\g<0>
是一个返回模式的完全匹配的后向引用。
英文:
There's no need for the dictionary or function. All the replacements are simple string wrapped around the original string, you can get that with a back-reference.
Use flags=re.I
to make the match case-insensitive.
text_bolded = re.sub(pattern, r'<b>\g<0></b>', text, flags=re.I)
\g<0>
is a back-reference that returns the full match of the pattern.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论