英文:
How can I separate symbols [">", "<", ">=", "<="], numeric value and unit from a string by using regular expression in Python?
问题
我想使用正则表达式从字符串列表中分离符号、数值和单位。
x = ">=40.55%"
目前我尝试在Python中使用以下正则表达式来分离符号、数值和单位
match = re.findall(r'(\A[>|<]*)*(\d+[.]+\d+)*([%|mg/dl|cm2]\Z)', x)
但是,它没有给出预期的输出。
预期输出:
symbol = >=
value = 40.55
unit = %
我可以知道如何在Python中使用正则表达式将字符串分离为符号、数值和单位吗?
英文:
I would like to separate symbols, numeric value and unit from a list of string by using regular expression.
x = ">=40.55%"
Currently I try to use the following regex in Python to separate symbol, numeric value, and unit
match = re.findall(r'^(\A[>|<])*(\d+[.]+\d+)*([%|mg/dl|cm2]\Z)',i)
But, it doesn't give the expected output.
Expected output:
symbol = >=
value = 40.55
unit = %
Can I know how to use regular expression in Python to separate a string into symbol, numeric value and unit?
答案1
得分: 3
以下是您要求的翻译部分:
import re
regex = re.compile(r'\A(=|<=|>=|<|>)(-?\d+(?:\.\d+)?)(%|mg/dl|cm2)\Z')
x = ">=40.55%"
m = regex.match(x)
symbol, value, unit = m.groups()
# symbol: '>='
# value: '40.55'
# unit: '%'
# let's try to match a different string: "=-345mg/dl"
regex.match("=-345mg/dl").groups()
# output: ('=', '-345', 'mg/dl')
这段代码是用于处理正则表达式的 Python 代码示例,它用于匹配和提取字符串中的特定模式,如符号、值和单位。请注意,代码部分未进行翻译。如果您需要进一步的解释或有其他问题,请随时提出。
英文:
Below I made some assumptions about your format, for example that numbers like .3
(to stand for 0.3
) are disallowed.
import re
regex = re.compile(r'\A(=|<=|>=|<|>)(-?\d+(?:\.\d+)?)(%|mg/dl|cm2)\Z')
x = ">=40.55%"
m = regex.match(x)
symbol, value, unit = m.groups()
# symbol: '>='
# value: '40.55'
# unit: '%'
# let's try to match a different string: "=-345mg/dl"
regex.match("=-345mg/dl").groups()
# output: ('=', '-345', 'mg/dl')
Here, (?:...)
denotes a non-capturing group. Note that regex-initial ^
and regex-final $
are respective synonyms of \A
and \Z
except in MULTILINE
mode.
Check out the official Regular Expression HOWTO.
Credit goes to user Pranav Hosangadi for suggesting to match an optional minus sign to capture negative numbers.
答案2
得分: 1
这里有一个不需要预先定义允许的单位的答案。数字后面的任何内容都被视为单位。
正则表达式及其解释(在线尝试):
^([<>]=?)(-?\d+(?:\.\d+)?)(.*)$
-------------------------------
^ $ : 字符串或行的起始和结束
( )( )( ) : 捕获字符串各部分的组
[<>] : 小于或大于符号
=? : 可选的等号符号
-? : 可选的减号
\d+ : 一个或多个数字
(?: )? : 可选的非捕获组
\.\d+ : 小数点后跟一个或多个数字
.* : 任何数量的任何字符
代码:
result = re.findall(r"^([<>]=?)(-?\d+(?:\.\d+)?)(.*)$", ">=40.55%")
if result:
symbol, value, unit = result[0]
得到的结果:
symbol = '>='
value = '40.55'
unit = '%'
英文:
Here's an answer that doesn't require you to pre-define the allowable units. Anything after the number is considered a unit.
Regex and explanation (Try online):
^([<>]=?)(-?\d+(?:\.\d+)?)(.*)$
-------------------------------
^ $ : Start and end of string or line
( )( )( ) : Capturing groups for each portion of the string
[<>] : Less than or greater than symbol
=? : Optional equal symbol
-? : Optional minus sign
\d+ : One or more digits
(?: )? : Optional non-capturing group
\.\d+ : Decimal point followed by one or more digits
.* : Any number of any character
Code:
result = re.findall(r"^([<>]=?)(-?\d+(?:\.\d+)?)(.*)$", ">=40.55%")
if result:
symbol, value, unit = result[0]
which gives:
symbol = '>='
value = '40.55'
unit = '%'
答案3
得分: 1
以下是翻译好的部分:
(\A[>|<])*
可选地重复一个字符,其中字符可以是>
、|
或<
,位于字符串的开头,但最多只匹配一个字符。\A
应该出现在模式的开头,你应该将字符类放在捕获组内。[>|<]
不匹配=
字符。[>|<]
不匹配=
字符。[%|mg/dl|cm2]
匹配单个%
,但它不会匹配|
这些字符的替代项,它只匹配其中一个字符,例如%
、|
、m
等等。(\d+[.]+\d+)*
匹配一个必需的小数部分,但要注意,重复的捕获组会捕获最后一次迭代的值。因此,如果你有这样的字符串,例如>=40.55.2%
,捕获组的值将是5.2
。
带命名捕获组的示例:
import re
pattern = re.compile(r"\A(?P<symbol>[<>]=?)(?P<value>\d+(?:\.\d+)*)(?P<unit>%|mg/dl|cm2)\Z")
s = ">=40.55%"
m = pattern.match(s)
if m:
print(m.groupdict())
输出:
{'symbol': '>=', 'value': '40.55', 'unit': '%'}
该模式的解释:
\A
字符串的开头(?P<symbol>[<>]=?)
命名组 symbol,匹配其中一个>
、<
,以及可选的=
。(?P<value>\d+(?:\.\d+)*)
命名组 value,匹配 1 个或更多数字,以及可选的重复.
和 1 个或更多数字。(?P<unit>%|mg/dl|cm2)
命名组 unit,匹配其中一个替代项。\Z
字符串的结尾。
英文:
A few notes about the pattern ^(\A[>|<])*(\d+[.]+\d+)*([%|mg/dl|cm2]\Z)
why it does not give the expected output:
- This
(\A[>|<])*
optionally repeats a single char being one of>
|
<
at the start of the string, which will only match a single char at the most. The\A
should be at the start of the pattern, and you should repeat the character class inside of the capture group - This
[>|<]
does not match a=
char - This
[%|mg/dl|cm2]
does match the single%
but it does not mean matching alternatives with the|
, it is a character class matching one of%
|
m
etc... - This
(\d+[.]+\d+)*
Matches a mandatory decimal part, but note that repeating a capture group captures the value of the last iteration. So if you would have for example this string>=40.55.2%
the capture group value would be 5.2
Example using named capture groups:
import re
pattern = re.compile(r"\A(?P<symbol>[<>]=?)(?P<value>\d+(?:\.\d+)*)(?P<unit>%|mg/dl|cm2)\Z")
s = ">=40.55%"
m = pattern.match(s)
if m:
print(m.groupdict())
Output
{'symbol': '>=', 'value': '40.55', 'unit': '%'}
The pattern explained:
\A(?P<symbol>[<>]=?)(?P<value>\d+(?:\.\d+)*)(?P<unit>%|mg/dl|cm2)\Z
\A
Start of string(?P<symbol>[<>]=?)
Named group symbol, match one of<
>
and optional=
(?P<value>\d+(?:\.\d+)*)
Named group value, match 1+ digits and optionally repeat.
and 1+ digits(?P<unit>%|mg/dl|cm2)
Named group unit, match 1 of the alternatives\Z
End of string
答案4
得分: 0
以下是您要求的内容的中文翻译:
尝试
match = re.findall(r'(?:\A|\s)(=|<=|>=)(\d+\.\d+)(\%|mg/dl|cm2)(?:\Z|\s)', i)
匹配 ``xx.x``
match = re.findall(r'(?:\A|\s)(=|<=|>=)(\d+(?:\.\d+)?)(\%|mg/dl|cm2)(?:\Z|\s)', i)
匹配 ``xx.x`` 和 ``xx``,例如 ``i = "something >=40.55% or =30cm2 etc."`` ==> 结果: [('>=', '40.55', '%'), ('=', '30', 'cm2')]
英文:
Try
match = re.findall(r'(?:\A|\s)(=|<=|>=)(\d+\.\d+)(\%|mg/dl|cm2)(?:\Z|\s)',i)
matches xx.x
match = re.findall(r'(?:\A|\s)(=|<=|>=)(\d+(?:\.\d+)?)(\%|mg/dl|cm2)(?:\Z|\s)',i)
matches xx.x
and xx
, f.e. i = "something >=40.55% or =30cm2 etc."
==> result: [('>=', '40.55', '%'), ('=', '30', 'cm2')]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论