2023年3月4日 00:10:18go评论58阅读模式

英文:

How can I separate symbols [">", "<", ">=", "<="], numeric value and unit from a string by using regular expression in Python?

问题

我想使用正则表达式从字符串列表中分离符号、数值和单位。

x = "&gt;=40.55%"


目前我尝试在Python中使用以下正则表达式来分离符号、数值和单位

match = re.findall(r'(\A[&gt;|&lt;]*)*(\d+[.]+\d+)*([%|mg/dl|cm2]\Z)', x)


但是，它没有给出预期的输出。

预期输出：

symbol = &gt;=
value = 40.55
unit = %


我可以知道如何在Python中使用正则表达式将字符串分离为符号、数值和单位吗？

英文:

I would like to separate symbols, numeric value and unit from a list of string by using regular expression.

    x = &quot;&gt;=40.55%&quot;

Currently I try to use the following regex in Python to separate symbol, numeric value, and unit

    match = re.findall(r&#39;^(\A[&gt;|&lt;])*(\d+[.]+\d+)*([%|mg/dl|cm2]\Z)&#39;,i)

But, it doesn't give the expected output.

Expected output:

    symbol = &gt;=
    value = 40.55
    unit = %

Can I know how to use regular expression in Python to separate a string into symbol, numeric value and unit?

答案1

得分: 3

以下是您要求的翻译部分：

import re

regex = re.compile(r'\A(=|<=|>=|<|>)(-?\d+(?:\.\d+)?)(%|mg/dl|cm2)\Z')

x = ">=40.55%"
m = regex.match(x)
symbol, value, unit = m.groups()
# symbol: '>='
# value: '40.55'
# unit: '%'

# let's try to match a different string: "=-345mg/dl"
regex.match("=-345mg/dl").groups()
# output: ('=', '-345', 'mg/dl')

这段代码是用于处理正则表达式的 Python 代码示例，它用于匹配和提取字符串中的特定模式，如符号、值和单位。请注意，代码部分未进行翻译。如果您需要进一步的解释或有其他问题，请随时提出。

英文:

Below I made some assumptions about your format, for example that numbers like .3 (to stand for 0.3) are disallowed.

import re

regex = re.compile(r&#39;\A(=|&lt;=|&gt;=|&lt;|&gt;)(-?\d+(?:\.\d+)?)(%|mg/dl|cm2)\Z&#39;)

x = &quot;&gt;=40.55%&quot;
m = regex.match(x)
symbol, value, unit = m.groups()
# symbol: &#39;&gt;=&#39;
# value: &#39;40.55&#39;
# unit: &#39;%&#39;

# let&#39;s try to match a different string: &quot;=-345mg/dl&quot;
regex.match(&quot;=-345mg/dl&quot;).groups()
# output: (&#39;=&#39;, &#39;-345&#39;, &#39;mg/dl&#39;)

Here, (?:...) denotes a non-capturing group. Note that regex-initial ^ and regex-final $ are respective synonyms of \A and \Z except in MULTILINE mode.

Check out the official Regular Expression HOWTO.

Credit goes to user Pranav Hosangadi for suggesting to match an optional minus sign to capture negative numbers.

答案2

得分: 1

这里有一个不需要预先定义允许的单位的答案。数字后面的任何内容都被视为单位。

正则表达式及其解释（在线尝试）：

^([&lt;&gt;]=?)(-?\d+(?:\.\d+)?)(.*)$
-------------------------------
^                             $ : 字符串或行的起始和结束
 (      )(               )(  )  : 捕获字符串各部分的组
  [&lt;&gt;]                          : 小于或大于符号
      =?                        : 可选的等号符号
          -?                    : 可选的减号
            \d+                 : 一个或多个数字
               (?:     )?       : 可选的非捕获组
                  \.\d+         : 小数点后跟一个或多个数字
                           .*   : 任何数量的任何字符

代码：

result = re.findall(r"^([&lt;&gt;]=?)(-?\d+(?:\.\d+)?)(.*)$", "&gt;=40.55%")
if result:
    symbol, value, unit = result[0]

得到的结果：

symbol = '&gt;=&#39;
value = '40.55'
unit = '%'

英文:

Here's an answer that doesn't require you to pre-define the allowable units. Anything after the number is considered a unit.

Regex and explanation (Try online):

^([&lt;&gt;]=?)(-?\d+(?:\.\d+)?)(.*)$
-------------------------------
^                             $ : Start and end of string or line
 (      )(               )(  )  : Capturing groups for each portion of the string
  [&lt;&gt;]                          : Less than or greater than symbol
      =?                        : Optional equal symbol
          -?                    : Optional minus sign
            \d+                 : One or more digits
               (?:     )?       : Optional non-capturing group
                  \.\d+         : Decimal point followed by one or more digits
                           .*   : Any number of any character

Code:

result = re.findall(r&quot;^([&lt;&gt;]=?)(-?\d+(?:\.\d+)?)(.*)$&quot;, &quot;&gt;=40.55%&quot;)
if result:
    symbol, value, unit = result[0]

which gives:

symbol = &#39;&gt;=&#39;
value = &#39;40.55&#39;
unit = &#39;%&#39;

答案3

得分: 1

以下是翻译好的部分：

(\A[>|<])* 可选地重复一个字符，其中字符可以是 >、| 或 <，位于字符串的开头，但最多只匹配一个字符。\A 应该出现在模式的开头，你应该将字符类放在捕获组内。
[>|<] 不匹配 = 字符。
[>|<] 不匹配 = 字符。
[%|mg/dl|cm2] 匹配单个 %，但它不会匹配 | 这些字符的替代项，它只匹配其中一个字符，例如 %、|、m 等等。
(\d+[.]+\d+)* 匹配一个必需的小数部分，但要注意，重复的捕获组会捕获最后一次迭代的值。因此，如果你有这样的字符串，例如 >=40.55.2%，捕获组的值将是 5.2。

带命名捕获组的示例：

import re

pattern = re.compile(r"\A(?P<symbol>[<>]=?)(?P<value>\d+(?:\.\d+)*)(?P<unit>%|mg/dl|cm2)\Z")
s = ">=40.55%"
m = pattern.match(s)
if m:
    print(m.groupdict())

输出：

{'symbol': '>=', 'value': '40.55', 'unit': '%'}

该模式的解释：

\A 字符串的开头
(?P<symbol>[<>]=?) 命名组 symbol，匹配其中一个 >、<，以及可选的 =。
(?P<value>\d+(?:\.\d+)*) 命名组 value，匹配 1 个或更多数字，以及可选的重复 . 和 1 个或更多数字。
(?P<unit>%|mg/dl|cm2) 命名组 unit，匹配其中一个替代项。
\Z 字符串的结尾。

正则表达式演示 | Python 示例

英文:

A few notes about the pattern ^(\A[>|<])*(\d+[.]+\d+)*([%|mg/dl|cm2]\Z) why it does not give the expected output:

This (\A[>|<])* optionally repeats a single char being one of > | < at the start of the string, which will only match a single char at the most. The \A should be at the start of the pattern, and you should repeat the character class inside of the capture group
This [>|<] does not match a = char
This [%|mg/dl|cm2] does match the single % but it does not mean matching alternatives with the |, it is a character class matching one of % | m etc...
This (\d+[.]+\d+)* Matches a mandatory decimal part, but note that repeating a capture group captures the value of the last iteration. So if you would have for example this string >=40.55.2% the capture group value would be 5.2

Example using named capture groups:

import re

pattern = re.compile(r&quot;\A(?P&lt;symbol&gt;[&lt;&gt;]=?)(?P&lt;value&gt;\d+(?:\.\d+)*)(?P&lt;unit&gt;%|mg/dl|cm2)\Z&quot;)
s = &quot;&gt;=40.55%&quot;
m = pattern.match(s)
if m:
    print(m.groupdict())

Output

{&#39;symbol&#39;: &#39;&gt;=&#39;, &#39;value&#39;: &#39;40.55&#39;, &#39;unit&#39;: &#39;%&#39;}

The pattern explained:

\A(?P&lt;symbol&gt;[&lt;&gt;]=?)(?P&lt;value&gt;\d+(?:\.\d+)*)(?P&lt;unit&gt;%|mg/dl|cm2)\Z

\A Start of string
(?P<symbol>[<>]=?) Named group symbol, match one of < > and optional =
(?P<value>\d+(?:\.\d+)*) Named group value, match 1+ digits and optionally repeat . and 1+ digits
(?P<unit>%|mg/dl|cm2) Named group unit, match 1 of the alternatives
\Z End of string

Regex demo | Python demo

答案4

得分: 0

以下是您要求的内容的中文翻译：

尝试

match = re.findall(r'(?:\A|\s)(=|&lt;=|&gt;=)(\d+\.\d+)(\%|mg/dl|cm2)(?:\Z|\s)', i)
匹配 ``xx.x``

match = re.findall(r'(?:\A|\s)(=|&lt;=|&gt;=)(\d+(?:\.\d+)?)(\%|mg/dl|cm2)(?:\Z|\s)', i)
匹配 ``xx.x`` 和 ``xx``，例如 ``i = "something &gt;=40.55% or =30cm2 etc."`` ==> 结果: [('&gt;=', '40.55', '%'), ('=', '30', 'cm2')]

英文:

Try

match = re.findall(r&#39;(?:\A|\s)(=|&lt;=|&gt;=)(\d+\.\d+)(\%|mg/dl|cm2)(?:\Z|\s)&#39;,i)

matches xx.x

match = re.findall(r&#39;(?:\A|\s)(=|&lt;=|&gt;=)(\d+(?:\.\d+)?)(\%|mg/dl|cm2)(?:\Z|\s)&#39;,i)

matches xx.x and xx, f.e. i = "something >=40.55% or =30cm2 etc." ==> result: [('>=', '40.55', '%'), ('=', '30', 'cm2')]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How can I separate symbols [">", "<", ">=", "<="], numeric value and unit from a string by using regular expression in Python?

问题

答案1

答案2

答案3

答案4

使用openpyxl向现有的Excel表格添加新列的方法

循环以合并具有相同键的字典。

python isort工具第一方导入和第三方导入问题

如何使用for循环批量缩放多个图像？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论