2023年5月21日 03:41:20go评论58阅读模式

英文:

How to write a Python regex that matches strings with both words and digits, excluding digits-only strings?

问题

Sure, here's the translation of the requested text:

我想编写一个正则表达式，匹配可能包含单词和数字，但不仅包含数字的字符串。

我使用了这个正则表达式 [A-z+\d*]，但它不起作用。

一些匹配的示例：

123expression
exp123ression

不匹配的示例：

你能帮我解决这个问题吗？非常感谢。

英文:

I want to write a regex that matches a string that may contain both words and digits and not digits only.

I used this regex [A-z+\d*], but it does not work.

Some matched samples:

expression123
123expression
exp123ression

Not matched sample:

1235234567544

Can you help me with this one? Thank you in advance

答案1

得分: 7

Lookarounds to the rescue!

^(?!\d+$)\w+$

This uses a negative lookahead construct and anchors, see a demo on regex101.com

<hr>

Note that you could have the same result with pure Python code alone:

samples = ["expression123", "123expression", "exp123ression", "1235234567544"]

filtered = [item for item in samples if not item.isdigit()]
print(filtered)

['expression123', '123expression', 'exp123ression']

See another demo on ideone.com.

With both approaches you wouldn't account for input strings like -1 or 1.0 (they'd be allowed).

<hr>

Tests

As the discussion somewhat arose, here's a small test suite for different sample sizes and expressions:

import string, random, re, timeit

class RegexTester():
samples = []
expressions_to_test = {"Cary": "^(?=.*\D)\w+$",
"Jan": "^(?!\d+$)\w+$"}

def __init__(self, sample_size=100, word_size=10, times=100):
	self.sample_size = sample_size
	self.word_size = word_size
	self.times = times

	# generate samples
	self.samples = [&quot;&quot;.join(random.choices(string.ascii_letters + string.digits, k=self.word_size))
					for _ in range(self.sample_size)]

	# compile the expressions in question
	for key, expression in self.expressions_to_test.items():
		self.expressions_to_test[key] = {&quot;raw&quot;: expression, &quot;compiled&quot;: re.compile(expression)}

def describe_sample(self):
	only_digits = [item for item in self.samples if all(char.isdigit() for char in item)]
	return only_digits

def test_expressions(self):

	def regex_test(samples, expr):
		return [expr.search(item) for item in samples]

	for key, values in self.expressions_to_test.items():
		t = timeit.Timer(lambda: regex_test(self.samples, values[&quot;compiled&quot;]))

		print(&quot;{key}, Times: {times}, Result: {result}&quot;.format(key=key,
															   times=self.times,
															   result=t.timeit(100)))

rt = RegexTester(sample_size=10 ** 5, word_size=10, times=10 ** 4)
#rt.describe_sample()
rt.test_expressions()

Which for a sample size of 10^5, a word size of 10 gave the comparable results for the both expressions:

Cary, Times: 10000, Result: 6.1406331
Jan, Times: 10000, Result: 5.948537699999999

When you set the sample size to 10^4 and the word size to 10^3, the result is the same:

Cary, Times: 10000, Result: 10.1723557
Jan, Times: 10000, Result: 9.697761900000001

You'll get significant differences when the strings consist only of numbers (aka the samples are generated only with numbers):

Cary, Times: 10000, Result: 25.4842013
Jan, Times: 10000, Result: 17.3708319

Note that this is randomly generated text and due to the method of generating it, the longer the strings are, the less likely they are to consist only of numbers. In the end it will depend on the actual text inputs.

英文:

Lookarounds to the rescue!

^(?!\d+$)\w+$

This uses a negative lookahead construct and anchors, see a demo on regex101.com

<hr>

Note that you could have the same result with pure Python code alone:

samples = [&quot;expression123&quot;, &quot;123expression&quot;, &quot;exp123ression&quot;, &quot;1235234567544&quot;]
 
filtered = [item for item in samples if not item.isdigit()]
print(filtered)

# [&#39;expression123&#39;, &#39;123expression&#39;, &#39;exp123ression&#39;]

See another demo on ideone.com.

With both approaches you wouldn't account for input strings like -1 or 1.0 (they'd be allowed).

<hr>

Tests

As the discussion somewhat arose, here's a small test suite for different sample sizes and expressions:

import string, random, re, timeit


class RegexTester():
	samples = []
	expressions_to_test = {&quot;Cary&quot;: &quot;^(?=.*\D)\w+$&quot;,
						   &quot;Jan&quot;: &quot;^(?!\d+$)\w+$&quot;}

	def __init__(self, sample_size=100, word_size=10, times=100):
		self.sample_size = sample_size
		self.word_size = word_size
		self.times = times

		# generate samples
		self.samples = [&quot;&quot;.join(random.choices(string.ascii_letters + string.digits, k=self.word_size))
						for _ in range(self.sample_size)]

		# compile the expressions in question
		for key, expression in self.expressions_to_test.items():
			self.expressions_to_test[key] = {&quot;raw&quot;: expression, &quot;compiled&quot;: re.compile(expression)}

	def describe_sample(self):
		only_digits = [item for item in self.samples if all(char.isdigit() for char in item)]
		return only_digits

	def test_expressions(self):

		def regex_test(samples, expr):
			return [expr.search(item) for item in samples]

		for key, values in self.expressions_to_test.items():
			t = timeit.Timer(lambda: regex_test(self.samples, values[&quot;compiled&quot;]))

			print(&quot;{key}, Times: {times}, Result: {result}&quot;.format(key=key,
																   times=self.times,
																   result=t.timeit(100)))


rt = RegexTester(sample_size=10 ** 5, word_size=10, times=10 ** 4)
#rt.describe_sample()
rt.test_expressions()

Which for a sample size of 10^5, a word size of 10 gave the comparable results for the both expressions:

Cary, Times: 10000, Result: 6.1406331
Jan, Times: 10000, Result: 5.948537699999999

When you set the sample size to 10^4 and the word size to 10^3, the result is the same:

Cary, Times: 10000, Result: 10.1723557
Jan, Times: 10000, Result: 9.697761900000001

You'll get significant differences when the strings consist only of numbers (aka the samples are generated only with numbers):

Cary, Times: 10000, Result: 25.4842013
Jan, Times: 10000, Result: 17.3708319

答案2

得分: 2

另一种解决方案：只需在字符串中搜索除数字以外的其他字符：

import re

data = [
'expression123',
'123expression',
'exp123ression',
'1235234567544'
]

for t in data:
    m = re.search(r'\D', t)
    if m:
        print(t)

打印：

expression123
123expression
exp123ression

英文:

Another solution: simply search for other character than digit in your string:

import re

data = [
&#39;expression123&#39;,
&#39;123expression&#39;,
&#39;exp123ression&#39;,
&#39;1235234567544&#39;
]

for t in data:
	m = re.search(r&#39;\D&#39;, t)
	if m:
		print(t)

Prints:

expression123
123expression
exp123ression

答案3

得分: 2

你可以尝试匹配以下正则表达式。

^(?:\w*[a-zA-Z_]\w*)?$

演示

这匹配空字符串。如果字符串必须至少包含一个字符，则可以简化为

^\w*[a-zA-Z_]\w*$

英文:

You may attempt to match the following regular expression.

^(?:\w*[a-zA-Z_]\w*)?$

Demo

This matches empty strings. If the string must contain at least one character this can be simplified to

^\w*[a-zA-Z_]\w*$

答案4

得分: 2

请注意，[A-z] 匹配的内容更多比 [A-Za-z] 多。

如果您想在 Python 3 中检查 alnum 而不仅仅是数字：

strings = [
    &quot;expression123&quot;,
    &quot;123expression&quot;,
    &quot;exp123ression&quot;,
    &quot;1235234567544&quot;,
]

for s in strings:
    if not s.isnumeric() and s.isalnum():
        print(s)

输出

expression123
123expression
exp123ression

注意，.isnumeric() 和 .isalnum() 都对 unicode 有效。

英文:

Note that [A-z] matches more than [A-Za-z]

If you want to check for alnum and not only digits in Python 3:

strings = [
    &quot;expression123&quot;,
    &quot;123expression&quot;,
    &quot;exp123ression&quot;,
    &quot;1235234567544&quot;,
]

for s in strings:
    if not s.isnumeric() and s.isalnum():
        print(s)

Output

expression123
123expression
exp123ression

Note that both .isnumeric() and .isalnum() are unicode aware:

答案5

得分: 0

尝试这个：

import re

regex = r'^(?=.*[A-Za-z])(?=.*\d)[A-Za-z\d]+$'

strings = ['expression123', '123expression', 'exp123ression', '1235234567544']

for string in strings:
    if re.match(regex, string):
        print(f'Matched: {string}')
    else:
        print(f'Not matched: {string}')

这将产生以下结果：

匹配: expression123
匹配: 123expression
匹配: exp123ression
未匹配: 1235234567544

英文:

try this:

 import re

regex = r&#39;^(?=.*[A-Za-z])(?=.*\d)[A-Za-z\d]+$&#39;

strings = [&#39;expression123&#39;, &#39;123expression&#39;, &#39;exp123ression&#39;, &#39;1235234567544&#39;]

for string in strings:
    if re.match(regex, string):
        print(f&#39;Matched: {string}&#39;)
    else:
        print(f&#39;Not matched: {string}&#39;)

this would give

Matched: expression123
Matched: 123expression
Matched: exp123ression
Not matched: 1235234567544

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to write a Python regex that matches strings with both words and digits, excluding digits-only strings?

问题

答案1

['expression123', '123expression', 'exp123ression']

Tests

Tests

答案2

答案3

答案4

答案5

在一个NumPy数组中找到最接近网格的元素的索引。

“AttributeError: ‘property’ object has no attribute ‘get'”在使用FastAPI中使用Depends时发生

在C中的for循环随机中断。

为枚举类分配默认值，当构造函数提供的值不在枚举中时

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论