How can I separate symbols [">", "<", ">=", "<="], numeric value and unit from a string by using regular expression in Python?

huangapple go评论71阅读模式
英文:

How can I separate symbols [">", "<", ">=", "<="], numeric value and unit from a string by using regular expression in Python?

问题

我想使用正则表达式从字符串列表中分离符号、数值和单位。

x = "&gt;=40.55%"

目前我尝试在Python中使用以下正则表达式来分离符号、数值和单位

match = re.findall(r'(\A[&gt;|&lt;]*)*(\d+[.]+\d+)*([%|mg/dl|cm2]\Z)', x)

但是,它没有给出预期的输出。

预期输出:
symbol = &gt;=
value = 40.55
unit = %

我可以知道如何在Python中使用正则表达式将字符串分离为符号、数值和单位吗?
英文:

I would like to separate symbols, numeric value and unit from a list of string by using regular expression.

    x = &quot;&gt;=40.55%&quot;

Currently I try to use the following regex in Python to separate symbol, numeric value, and unit

    match = re.findall(r&#39;^(\A[&gt;|&lt;])*(\d+[.]+\d+)*([%|mg/dl|cm2]\Z)&#39;,i)

But, it doesn't give the expected output.

Expected output:

    symbol = &gt;=
    value = 40.55
    unit = %

Can I know how to use regular expression in Python to separate a string into symbol, numeric value and unit?

答案1

得分: 3

以下是您要求的翻译部分:

import re

regex = re.compile(r'\A(=|<=|>=|<|>)(-?\d+(?:\.\d+)?)(%|mg/dl|cm2)\Z')

x = ">=40.55%"
m = regex.match(x)
symbol, value, unit = m.groups()
# symbol: '>='
# value: '40.55'
# unit: '%'

# let's try to match a different string: "=-345mg/dl"
regex.match("=-345mg/dl").groups()
# output: ('=', '-345', 'mg/dl')

这段代码是用于处理正则表达式的 Python 代码示例,它用于匹配和提取字符串中的特定模式,如符号、值和单位。请注意,代码部分未进行翻译。如果您需要进一步的解释或有其他问题,请随时提出。

英文:

Below I made some assumptions about your format, for example that numbers like .3 (to stand for 0.3) are disallowed.

import re

regex = re.compile(r&#39;\A(=|&lt;=|&gt;=|&lt;|&gt;)(-?\d+(?:\.\d+)?)(%|mg/dl|cm2)\Z&#39;)

x = &quot;&gt;=40.55%&quot;
m = regex.match(x)
symbol, value, unit = m.groups()
# symbol: &#39;&gt;=&#39;
# value: &#39;40.55&#39;
# unit: &#39;%&#39;

# let&#39;s try to match a different string: &quot;=-345mg/dl&quot;
regex.match(&quot;=-345mg/dl&quot;).groups()
# output: (&#39;=&#39;, &#39;-345&#39;, &#39;mg/dl&#39;)

Here, (?:...) denotes a non-capturing group. Note that regex-initial ^ and regex-final $ are respective synonyms of \A and \Z except in MULTILINE mode.

Check out the official Regular Expression HOWTO.

Credit goes to user Pranav Hosangadi for suggesting to match an optional minus sign to capture negative numbers.

答案2

得分: 1

这里有一个不需要预先定义允许的单位的答案。数字后面的任何内容都被视为单位。

正则表达式及其解释(在线尝试):

^([&lt;&gt;]=?)(-?\d+(?:\.\d+)?)(.*)$
-------------------------------
^                             $ : 字符串或行的起始和结束
 (      )(               )(  )  : 捕获字符串各部分的组
  [&lt;&gt;]                          : 小于或大于符号
      =?                        : 可选的等号符号
          -?                    : 可选的减号
            \d+                 : 一个或多个数字
               (?:     )?       : 可选的非捕获组
                  \.\d+         : 小数点后跟一个或多个数字
                           .*   : 任何数量的任何字符

代码:

result = re.findall(r"^([&lt;&gt;]=?)(-?\d+(?:\.\d+)?)(.*)$", "&gt;=40.55%")
if result:
    symbol, value, unit = result[0]

得到的结果:

symbol = '&gt;=&#39;
value = '40.55'
unit = '%'
英文:

Here's an answer that doesn't require you to pre-define the allowable units. Anything after the number is considered a unit.

Regex and explanation (Try online):

^([&lt;&gt;]=?)(-?\d+(?:\.\d+)?)(.*)$
-------------------------------
^                             $ : Start and end of string or line
 (      )(               )(  )  : Capturing groups for each portion of the string
  [&lt;&gt;]                          : Less than or greater than symbol
      =?                        : Optional equal symbol
          -?                    : Optional minus sign
            \d+                 : One or more digits
               (?:     )?       : Optional non-capturing group
                  \.\d+         : Decimal point followed by one or more digits
                           .*   : Any number of any character

Code:

result = re.findall(r&quot;^([&lt;&gt;]=?)(-?\d+(?:\.\d+)?)(.*)$&quot;, &quot;&gt;=40.55%&quot;)
if result:
    symbol, value, unit = result[0]

which gives:

symbol = &#39;&gt;=&#39;
value = &#39;40.55&#39;
unit = &#39;%&#39;

答案3

得分: 1

以下是翻译好的部分:

  • (\A[&gt;|&lt;])* 可选地重复一个字符,其中字符可以是 &gt;|&lt;,位于字符串的开头,但最多只匹配一个字符。\A 应该出现在模式的开头,你应该将字符类放在捕获组内。
  • [&gt;|&lt;] 不匹配 = 字符。
  • [&gt;|&lt;] 不匹配 = 字符。
  • [%|mg/dl|cm2] 匹配单个 %,但它不会匹配 | 这些字符的替代项,它只匹配其中一个字符,例如 %|m 等等。
  • (\d+[.]+\d+)* 匹配一个必需的小数部分,但要注意,重复的捕获组会捕获最后一次迭代的值。因此,如果你有这样的字符串,例如 &gt;=40.55.2%,捕获组的值将是 5.2

带命名捕获组的示例:

import re

pattern = re.compile(r"\A(?P<symbol>[<>]=?)(?P<value>\d+(?:\.\d+)*)(?P<unit>%|mg/dl|cm2)\Z")
s = ">=40.55%"
m = pattern.match(s)
if m:
    print(m.groupdict())

输出:

{'symbol': '>=', 'value': '40.55', 'unit': '%'}

该模式的解释:

  • \A 字符串的开头
  • (?P<symbol>[<>]=?) 命名组 symbol,匹配其中一个 ><,以及可选的 =
  • (?P<value>\d+(?:\.\d+)*) 命名组 value,匹配 1 个或更多数字,以及可选的重复 . 和 1 个或更多数字。
  • (?P<unit>%|mg/dl|cm2) 命名组 unit,匹配其中一个替代项。
  • \Z 字符串的结尾。

正则表达式演示 | Python 示例

英文:

A few notes about the pattern ^(\A[&gt;|&lt;])*(\d+[.]+\d+)*([%|mg/dl|cm2]\Z) why it does not give the expected output:

  • This (\A[&gt;|&lt;])* optionally repeats a single char being one of &gt; | &lt; at the start of the string, which will only match a single char at the most. The \A should be at the start of the pattern, and you should repeat the character class inside of the capture group
  • This [&gt;|&lt;] does not match a = char
  • This [%|mg/dl|cm2] does match the single % but it does not mean matching alternatives with the |, it is a character class matching one of % | m etc...
  • This (\d+[.]+\d+)* Matches a mandatory decimal part, but note that repeating a capture group captures the value of the last iteration. So if you would have for example this string &gt;=40.55.2% the capture group value would be 5.2

Example using named capture groups:

import re

pattern = re.compile(r&quot;\A(?P&lt;symbol&gt;[&lt;&gt;]=?)(?P&lt;value&gt;\d+(?:\.\d+)*)(?P&lt;unit&gt;%|mg/dl|cm2)\Z&quot;)
s = &quot;&gt;=40.55%&quot;
m = pattern.match(s)
if m:
    print(m.groupdict())

Output

{&#39;symbol&#39;: &#39;&gt;=&#39;, &#39;value&#39;: &#39;40.55&#39;, &#39;unit&#39;: &#39;%&#39;}

The pattern explained:

\A(?P&lt;symbol&gt;[&lt;&gt;]=?)(?P&lt;value&gt;\d+(?:\.\d+)*)(?P&lt;unit&gt;%|mg/dl|cm2)\Z
  • \A Start of string
  • (?P&lt;symbol&gt;[&lt;&gt;]=?) Named group symbol, match one of &lt; &gt; and optional =
  • (?P&lt;value&gt;\d+(?:\.\d+)*) Named group value, match 1+ digits and optionally repeat . and 1+ digits
  • (?P&lt;unit&gt;%|mg/dl|cm2) Named group unit, match 1 of the alternatives
  • \Z End of string

Regex demo | Python demo

答案4

得分: 0

以下是您要求的内容的中文翻译:

尝试

match = re.findall(r'(?:\A|\s)(=|&lt;=|&gt;=)(\d+\.\d+)(\%|mg/dl|cm2)(?:\Z|\s)', i)
匹配 ``xx.x``

match = re.findall(r'(?:\A|\s)(=|&lt;=|&gt;=)(\d+(?:\.\d+)?)(\%|mg/dl|cm2)(?:\Z|\s)', i)
匹配 ``xx.x````xx``,例如 ``i = "something &gt;=40.55% or =30cm2 etc."`` ==> 结果: [('&gt;=', '40.55', '%'), ('=', '30', 'cm2')]
英文:

Try

match = re.findall(r&#39;(?:\A|\s)(=|&lt;=|&gt;=)(\d+\.\d+)(\%|mg/dl|cm2)(?:\Z|\s)&#39;,i)

matches xx.x

match = re.findall(r&#39;(?:\A|\s)(=|&lt;=|&gt;=)(\d+(?:\.\d+)?)(\%|mg/dl|cm2)(?:\Z|\s)&#39;,i)

matches xx.x and xx, f.e. i = &quot;something &gt;=40.55% or =30cm2 etc.&quot; ==> result: [(&#39;&gt;=&#39;, &#39;40.55&#39;, &#39;%&#39;), (&#39;=&#39;, &#39;30&#39;, &#39;cm2&#39;)]

huangapple
  • 本文由 发表于 2023年3月4日 00:10:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75629418.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定