英文:
delete by regex rule
问题
I have some data. I want to delete part of it by a regex rule.
我有一些数据。我想通过正则表达式规则删除其中的一部分。
I want to delete any character except for the period between two numbers and the number.
我想删除除两个数字之间的句点和数字之外的任何字符。
Data as follows:
数据如下:
str1 = 'ABC.5,696.05'
str2 = 'xxx3,769.01'
The result should be '5696.05' and '3769.01'.
结果应为'5696.05'和'3769.01'。
I use re.sub(r'[^\d\.]', '', str1)
. But it cannot delete the first '.'.
我使用 re.sub(r'[^\d\.]', '', str1)
。但它无法删除第一个'.'。
英文:
I have some data. I want to delete part of it by a regex rule.
I want to delete any character except for
the period between two number and the number.
Data as follows:
str1 = 'ABC.5,696.05'
str2 = 'xxx3,769.01'
The result should be '5696.05' and '3769.01' .
I use re.sub(r'[^\d\.]', '', str1)
. But it can not delete the first '.'.
答案1
得分: 1
我不是正则表达式的专家,所以你可以链式调用方法:
>>> float(re.sub('^[^\d]+', '', str1).replace(',', ''))
5696.05
>>> float(re.sub('^[^\d]+', '', str2).replace(',', ''))
3769.01
正则表达式用于删除字符串开头的非数字前缀,然后使用简单的替换来删除千位分隔符。
英文:
I'm not an expert of regex so you can chain methods:
>>> float(re.sub('^[^\d]+', '', str1).replace(',', ''))
5696.05
>>> float(re.sub('^[^\d]+', '', str2).replace(',', ''))
3769.01
A regex to remove non numeric prefix at the start of the strings and a simple substitution to remove thousands separators.
答案2
得分: 1
以下是翻译好的部分:
这可以分为两个阶段完成:
- 找到以数字开头和以数字结尾的片段,
- 替换其中不是数字或点的所有内容。
您可以将回调函数传递给 sub
:
print(re.sub(r'.*?(\d.+\d).*', lambda x: re.sub(r'[^\d.]|(?<!\d)\.|\.(?!\d)','',x.group()),'ABC.5,696.05'))
# 5696.05
在这里,外部的 sub
捕获了第一个和最后一个数字之间的所有内容,并将其传递给 lambda 函数。
Lambda 函数删除了:
- 非数字或点:
[^\d.]
, - 没有数字前缀的点
(?<!\d)\.
, - 没有数字后缀的点
\.(?!\d)
。
英文:
This can be done in two stages:
- Find segment starting and ending with a digit,
- Replace everything what is not a digit or dot in between.
You can pass callback to sub
print(re.sub(r'.*?(\d.+\d).*', lambda x: re.sub(r'[^\d.]|(?<!\d)\.|\.(?!\d)','',x.group()),'ABC.5,696.05'))
# 5696.05
Here outer sub
catches everything between first and last digit into group and passes it into lambda.
Lambda removes:
- not digits or dots:
[^\d.]
, - dots that are not preceded by digit
(?<!\d)\.
- dots that are not followed by digit
\.(?!\d)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论