2023年3月15日 21:10:12go评论102阅读模式

英文:

Extract digits from a string within a word

问题

import re
print(re.findall(r'\b\d\b', text))

英文:

I want a regular expression, which returns only digits, which are within a word, but I can only find expressions, which returns all digits in a string.

I've used this example:
text = 'I need this number inside my wor5d, but also this word3 and this 4word, but not this 1 and not this 555.'

The following code returns all digits, but I am only interested in ['5', '3', '4']
import re print(re.findall(r'\d+', text))

Any suggestions?

答案1

得分: 1

可以使用以下代码：

re.findall(r'(?<=[a-zA-Z])\d+|\d+(?=[a-zA-Z])', text)

这个正则表达式将提取所有与ASCII字母紧随其后或紧随其前的一个或多个数字块。

对于Python re的完全Unicode版本，代码如下：

(?<=[^\W\d_])\d+|\d+(?=[^\W\d_])

其中 [^\W\d_] 匹配任何Unicode字母。

详细信息请参阅正则表达式演示。

英文:

You can use

re.findall(r&#39;(?&lt;=[a-zA-Z])\d+|\d+(?=[a-zA-Z])&#39;, text)

This regex will extract all one or more digit chunks that are immediately preceded or followed with an ASCII letter.

A fully Unicode version for Python re would look like

(?&lt;=[^\W\d_])\d+|\d+(?=[^\W\d_])

where [^\W\d_] matches any Unicode letter.

See the regex demo for reference.

答案2

得分: -1

使用 [`str.translate`][1] 的一种方法，而不使用 *regex* 或 `re` 模块：
```python3
from string import ascii_letters
delete_dict = {sp_character: '&#39;&#39; for sp_character in ascii_letters}
table = str.maketrans(delete_dict)
text = '&#39;I 77! need 1:5 this number inside my wor5d, but also this word3 and this 4word, but not this 1 and not this 555.&#39;
print([res for s in text.rstrip('&#39;.&#39;).split()
       if not (s2 := s.rstrip(',&#39;)).isnumeric() and (res := s2.translate(table)) and res.isnumeric()])

输出：

['5', '3', '4']

性能

我很好奇，所以进行了一些基准测试，比较了与其他方法的性能。看起来 str.translate 甚至比正则表达式实现还要快。

这是我的带有 timeit 的基准代码：

import re
from string import ascii_letters
from timeit import timeit
_NUM_RE = re.compile(r'(?<=[a-zA-Z])\d+|\d+(?=[a-zA-Z])')
delete_dict = {sp_character: '&#39;&#39; for sp_character in ascii_letters}
_TABLE = str.maketrans(delete_dict)
text = '&#39;I need this number inside my wor5d, but also this word3 and this 4word, but not this 1 and not this 555.&#39;
def main():
    n = 100_000
    print('regex:         ', timeit("re.findall(r'(?<=[a-zA-Z])\d+|\d+(?=[a-zA-Z])', text)",
                 globals=globals(), number=n))
    print('regex (opt):   ', (timeit("_NUM_RE.findall(text)",
                 globals=globals(), number=n)))
    print('iter_char:     ', timeit(""
k=set()
for x in range(1,len(text)-1):
    if text[x-1].isdigit() and text[x].isalpha():
        k.add(text[x-1])
    if text[x].isdigit() and text[x+1].isalpha():
        k.add(text[x])
    if text[x-1].isalpha() and text[x].isdigit() and text[x+1].isalpha():
        k.add(text[x])
    if text[x-1].isalpha() and text[x].isdigit():
        k.add(text[x])
    "", globals=globals(), number=n))
    print('str.translate: ', timeit(""
[
    res for s in text.rstrip('&#39;.&#39;).split()
    if not (s2 := s.rstrip(',&#39;)).isnumeric() and (res := s2.translate(_TABLE)) and res.isnumeric()
]
    "", globals=globals(), number=n))
if __name__ == '__main__':
    main()

结果（Mac OS X - M1）：

regex:          0.5315765410050517
regex (opt):    0.5069837079936406
iter_char:      2.5037198749923846
str.translate:  0.37348733299586456


<details>
<summary>英文:</summary>
**An approach with [`str.translate`][1]**, without the use of *regex* or `re` module:
```python3
from string import ascii_letters
delete_dict = {sp_character: &#39;&#39; for sp_character in ascii_letters}
table = str.maketrans(delete_dict)
text = &#39;I 77! need 1:5 this number inside my wor5d, but also this word3 and this 4word, but not this 1 and not this 555.&#39;
print([res for s in text.rstrip(&#39;.&#39;).split()
       if not (s2 := s.rstrip(&#39;,&#39;)).isnumeric() and (res := s2.translate(table)) and res.isnumeric()])

Out:

[&#39;5&#39;, &#39;3&#39;, &#39;4&#39;]

Performance

I was curious so I did some benchmark tests to compare performance against other approaches. Looks like str.translate is faster even than the regex implementation.

Here is my benchmark code with timeit:

import re
from string import ascii_letters
from timeit import timeit
_NUM_RE = re.compile(r&#39;(?&lt;=[a-zA-Z])\d+|\d+(?=[a-zA-Z])&#39;)
delete_dict = {sp_character: &#39;&#39; for sp_character in ascii_letters}
_TABLE = str.maketrans(delete_dict)
text = &#39;I need this number inside my wor5d, but also this word3 and this 4word, but not this 1 and not this 555.&#39;
def main():
    n = 100_000
    print(&#39;regex:         &#39;, timeit(&quot;re.findall(r&#39;(?&lt;=[a-zA-Z])\d+|\d+(?=[a-zA-Z])&#39;, text)&quot;,
                 globals=globals(), number=n))
    print(&#39;regex (opt):   &#39;, (timeit(&quot;_NUM_RE.findall(text)&quot;,
                 globals=globals(), number=n)))
    print(&#39;iter_char:     &#39;, timeit(&quot;&quot;&quot;
k=set()
for x in range(1,len(text)-1):
    if text[x-1].isdigit() and text[x].isalpha():
        k.add(text[x-1])
    if text[x].isdigit() and text[x+1].isalpha():
        k.add(text[x])
    if text[x-1].isalpha() and text[x].isdigit() and text[x+1].isalpha():
        k.add(text[x])
    if text[x-1].isalpha() and text[x].isdigit():
        k.add(text[x])
    &quot;&quot;&quot;, globals=globals(), number=n))
    print(&#39;str.translate: &#39;, timeit(&quot;&quot;&quot;
[
    res for s in text.rstrip(&#39;.&#39;).split()
    if not (s2 := s.rstrip(&#39;,&#39;)).isnumeric() and (res := s2.translate(_TABLE)) and res.isnumeric()
]
    &quot;&quot;&quot;, globals=globals(), number=n))
if __name__ == &#39;__main__&#39;:
    main()

Results (Mac OS X - M1):

regex:          0.5315765410050517
regex (opt):    0.5069837079936406
iter_char:      2.5037198749923846
str.translate:  0.37348733299586456

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从一个单词中提取字符串中的数字

问题

答案1

答案2

性能

Performance

Pythonic方式将枚举映射到API值

CFFI后端未找到Azure函数

Line-by-Line Lists – Python

Show multiple details in console, but not in textarea. How to show specific lines in textarea

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。