re.sub一个单词列表,忽略大小写

huangapple go评论62阅读模式
英文:

re.sub a list of words, ignore case

问题

我尝试将HTML <b> 元素添加到句子中一组单词中。经过一些搜索,我几乎搞定了,除了忽略大小写。

import re

bolds = ['test', 'tested']  # 我想要加粗这些单词,不区分大小写
text = "Test lorem tested ipsum dolor sit amet test, consectetur TEST adipiscing elit test."

pattern = r'\b(?:' + "|".join(bolds) + r')\b'
dict_repl = {k: f'<b>{k}</b>' for k in bolds}
text_bolded = re.sub(pattern, lambda m: dict_repl.get(m.group().lower(), m.group()), text, flags=re.IGNORECASE)
print(text_bolded)

输出:

<b>Test</b> lorem <b>tested</b> ipsum dolor sit amet <b>test</b>, consectetur <b>TEST</b> adipiscing elit <b>test</b>.

这个输出包含了 <b> 元素,以适应 TestTEST。希望这对你有帮助。

英文:

I am trying to add the html <b> element to a list of words in a sentence. After doing some search I got it almost working, except the ignore-case.

import re

bolds = ['test', 'tested']  # I want to bold these words, ignoring-case
text = "Test lorem tested ipsum dolor sit amet test, consectetur TEST adipiscing elit test."

pattern = r'\b(?:' + "|".join(bolds) + r')\b'
dict_repl = {k: f'<b>{k}</b>' for k in bolds}
text_bolded = re.sub(pattern, lambda m: dict_repl.get(m.group(), m.group()), text)
print(text_bolded)

Output:

Test lorem <b>tested</b> ipsum dolor sit amet <b>test</b>, consectetur TEST adipiscing elit <b>test</b>.

This output misses the <b> element for Test and TEST. In other words, I would like the output to be:

<b>Test</b> lorem <b>tested</b> ipsum dolor sit amet <b>test</b>, consectetur <b>TEST</b> adipiscing elit <b>test</b>.

One hack is that I explicitly add the capitalize and upper, like so ...

bolds = bolds + [b.capitalize() for b in bolds] + [b.upper() for b in bolds]

But I am thinking there must be a better way to do this. Besides, the above hack will miss words like tesT, etc.

Thank you!

答案1

得分: 2

不需要字典或函数。所有的替换都是简单的字符串包裹在原始字符串周围,您可以通过后向引用来获得它。

使用 flags=re.I 使匹配不区分大小写。

text_bolded = re.sub(pattern, r'<b>\g<0></b>', text, flags=re.I)

\g<0> 是一个返回模式的完全匹配的后向引用。

英文:

There's no need for the dictionary or function. All the replacements are simple string wrapped around the original string, you can get that with a back-reference.

Use flags=re.I to make the match case-insensitive.

text_bolded = re.sub(pattern, r&#39;&lt;b&gt;\g&lt;0&gt;&lt;/b&gt;&#39;, text, flags=re.I)

\g&lt;0&gt; is a back-reference that returns the full match of the pattern.

huangapple
  • 本文由 发表于 2023年3月7日 08:05:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75656915.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定