2023年3月7日 08:05:10go评论65阅读模式

英文:

re.sub a list of words, ignore case

问题

我尝试将HTML  元素添加到句子中一组单词中。经过一些搜索，我几乎搞定了，除了忽略大小写。

import re

bolds = ['test', 'tested']  # 我想要加粗这些单词，不区分大小写
text = "Test lorem tested ipsum dolor sit amet test, consectetur TEST adipiscing elit test."

pattern = r'\b(?:' + "|".join(bolds) + r')\b'
dict_repl = {k: f'&lt;b&gt;{k}&lt;/b&gt;' for k in bolds}
text_bolded = re.sub(pattern, lambda m: dict_repl.get(m.group().lower(), m.group()), text, flags=re.IGNORECASE)
print(text_bolded)

输出：

Test lorem tested ipsum dolor sit amet test, consectetur TEST adipiscing elit test.

这个输出包含了  元素，以适应 Test 和 TEST。希望这对你有帮助。

英文:

I am trying to add the html  element to a list of words in a sentence. After doing some search I got it almost working, except the ignore-case.

import re

bolds = [&#39;test&#39;, &#39;tested&#39;]  # I want to bold these words, ignoring-case
text = &quot;Test lorem tested ipsum dolor sit amet test, consectetur TEST adipiscing elit test.&quot;

pattern = r&#39;\b(?:&#39; + &quot;|&quot;.join(bolds) + r&#39;)\b&#39;
dict_repl = {k: f&#39;&lt;b&gt;{k}&lt;/b&gt;&#39; for k in bolds}
text_bolded = re.sub(pattern, lambda m: dict_repl.get(m.group(), m.group()), text)
print(text_bolded)

Output:

Test lorem tested ipsum dolor sit amet test, consectetur TEST adipiscing elit test.

This output misses the  element for Test and TEST. In other words, I would like the output to be:

Test lorem tested ipsum dolor sit amet test, consectetur TEST adipiscing elit test.

One hack is that I explicitly add the capitalize and upper, like so ...

bolds = bolds + [b.capitalize() for b in bolds] + [b.upper() for b in bolds]

But I am thinking there must be a better way to do this. Besides, the above hack will miss words like tesT, etc.

Thank you!

答案1

得分: 2

不需要字典或函数。所有的替换都是简单的字符串包裹在原始字符串周围，您可以通过后向引用来获得它。

使用 flags=re.I 使匹配不区分大小写。

text_bolded = re.sub(pattern, r'<b>\g<0></b>', text, flags=re.I)

\g<0> 是一个返回模式的完全匹配的后向引用。

英文:

There's no need for the dictionary or function. All the replacements are simple string wrapped around the original string, you can get that with a back-reference.

Use flags=re.I to make the match case-insensitive.

text_bolded = re.sub(pattern, r&#39;&lt;b&gt;\g&lt;0&gt;&lt;/b&gt;&#39;, text, flags=re.I)

\g<0> is a back-reference that returns the full match of the pattern.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

re.sub一个单词列表，忽略大小写

问题

答案1

程序在使用py2exe转换后立即关闭。

Is there a way to mock/patch all functions in a module at once (or patch whole module)?

有没有Go语言的等价物来实现Python的itertools.combinations函数？

当我使用ctypes.c_int()时，它返回一个不同的数字？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论