2023年5月11日 14:27:29go评论97阅读模式

英文:

Looking for a way to split strings in pandas dataframe depending on OR, AND and parentheses

问题

Here's a translation of your request:

所以我有一个用pandas创建的数据框，其中包含一个包含组件的列和一个包含约束条件的行。这些约束条件决定了组件应该被过滤到哪个类别。现在这些约束条件并不是非常直观的，所以我正在寻找一种将它们拆分成多个更易读的约束条件的方法。例如，如果一个约束条件是'A and (B or C)'，我希望将其拆分成两行'A and B'和'A and C'。然而，并非所有约束条件都像这个示例那么简单。

这是数据框的一个小示例：

组件	约束条件
123	A and (B or C)
456	((MIRROR='ELECTRIC' and MIRRORCAMERA!='NO') or (MIRROR='MANUAL' and (MIRROR_RIGHT!='NO' or MIRROR_LEFT!='NO'))) and STEERWHEEL_LOCK='NO'
789	LENGTH='122' or (LENGTH='135' and BATTERY='551') or LENGTH='149' or (LENGTH='181' and (BATTERY='674' or (BATTERY='551' and CHARGER!='NO')))

或者

import pandas as pd
dataex = {'Component': [123, 
                        456, 
                        789], 
          'Constraint': ["A and (B or C)", 
                         "((MIRROR='ELECTRIC' and MIRRORCAMERA!='NO') or (MIRROR='MANUAL' and (MIRROR_RIGHT!='NO' or MIRROR_LEFT!='NO'))) and STEERWHEEL_LOCK='NO'",
                         "LENGTH='122' or (LENGTH='135' and BATTERY='551') or LENGTH='149' or (LENGTH='181' and (BATTERY='674' or (BATTERY='551' and CHARGER!='NO')))"]}
df_example = pd.DataFrame(data=dataex)

正如我所说，我希望根据约束条件中的and和or以及括号的位置，将所有这些拆分成多行（如果需要的话）。所以我希望得到以下结果：

组件	约束条件
123	A and B
123	A and C
456	STEERWHEEL_LOCK='NO' and MIRROR='ELECTRIC' and MIRRORCAMERA!='NO'
456	STEERWHEEL_LOCK='NO' and MIRROR='MANUAL' and MIRROR_RIGHT!='NO'
456	STEERWHEEL_LOCK='NO' and MIRROR='MANUAL' and MIRROR_LEFT!='NO'
789	LENGTH='122'
789	LENGTH='135' and BATTERY='551'
789	LENGTH='149'
789	LENGTH='181' and BATTERY='674'
789	LENGTH='181' and BATTERY='551' and CHARGER!='NO'

或者

import pandas as pd
datares = {'Component':[123, 123, 456, 456, 456, 789, 789, 789, 789, 789],
           'Constraint':["A and B",
                         "A and C",
                         "STEERWHEEL_LOCK='NO' and MIRROR='ELECTRIC' and MIRRORCAMERA!='NO'",
                         "STEERWHEEL_LOCK='NO' and MIRROR='MANUAL' and MIRROR_RIGHT!='NO'",
                         "STEERWHEEL_LOCK='NO' and MIRROR='MANUAL' and MIRROR_LEFT!='NO'",
                         "LENGTH='122'",
                         "LENGTH='135' and BATTERY='551'",
                         "LENGTH='149'",
                         "LENGTH='181' and BATTERY='674'",
                         "LENGTH='181' and BATTERY='551' and CHARGER!='NO'"
                        ]}
df_result = pd.DataFrame(data=datares)

我已经尝试过在'or'上拆分约束条件并将其分割成数组，然后循环遍历它们以获得结果，但对于一些更复杂的约束条件，你会得到数组内嵌套的数组，然后一段时间后会变得非常混乱。我还尝试过创建一种逻辑树，但尚未在Python中使其正常工作。我希望你们中的一些人可能有一个好的想法或模块来帮助我解决这个问题。谢谢！

英文:

So I have a dataframe in pandas that consists of a column with Components, and a row with Constraints. These constraints decide in what category the components have to be filtered. Now these constraints are not very straight-forward, so I'm looking for a way to split them into multiple smaller, more readable constraints. So for example if a constraint is 'A and (B or C)', I want to split up into two rows 'A and B' and 'A and C'. Not all constraints are as easy as this example though.

Here's what a small selection of the dataframe might look like:

Componenent	Constraint
123	A and (B or C)
456	((MIRROR='ELECTRIC' and MIRRORCAMERA!='NO') or (MIRROR='MANUAL' and (MIRROR_RIGHT!='NO' or MIRROR_LEFT!='NO'))) and STEERWHEEL_LOCK='NO'
789	LENGTH='122' or (LENGTH='135' and BATTERY='551') or LENGTH='149' or (LENGTH='181' and (BATTERY='674' or (BATTERY='551' and CHARGER!='NO')))

import pandas as pd
dataex = {&#39;Component&#39;: [123, 
                        456, 
                        789], 
          &#39;Constraint&#39;: [&quot;A and (B or C)&quot;, 
                         &quot;((MIRROR=&#39;ELECTRIC&#39; and MIRRORCAMERA!=&#39;NO&#39;) or (MIRROR=&#39;MANUAL&#39; and (MIRROR_RIGHT!=&#39;NO&#39; or MIRROR_LEFT!=&#39;NO&#39;))) and STEERWHEEL_LOCK=&#39;NO&#39;&quot;, 
                         &quot;LENGTH=&#39;122&#39; or (LENGTH=&#39;135&#39; and BATTERY=&#39;551&#39;) or LENGTH=&#39;149&#39; or (LENGTH=&#39;181&#39; and (BATTERY=&#39;674&#39; or (BATTERY=&#39;551&#39; and CHARGER!=&#39;NO&#39;)))&quot;]}
df_example = pd.DataFrame(data=dataex)

Like I said, I'm hoping to split all these into multiple rows (if needed), depending on the and's and or's and parenthesis in the constraint. So I have the following result in mind:

Component	Constraint
123	A and B
123	A and C
456	STEERWHEEL_LOCK='NO' and MIRROR='ELECTRIC' and MIRRORCAMERA!='NO'
456	STEERWHEEL_LOCK='NO' and MIRROR='MANUAL' and MIRROR_RIGHT!='NO'
456	STEERWHEEL_LOCK='NO' and MIRROR='MANUAL' and MIRROR_LEFT!='NO'
789	LENGTH='122'
789	LENGTH='135' and BATTERY='551'
789	LENGTH='149'
789	LENGTH='181' and BATTERY='674'
789	LENGTH='181' and BATTERY='551' and CHARGER!='NO'

import pandas as pd
datares = {&#39;Component&#39;:[123, 123, 456, 456, 456, 789, 789, 789, 789, 789],
           &#39;Constraint&#39;:[&quot;A and B&quot;,
                         &quot;A and C&quot;,
                         &quot;STEERWHEEL_LOCK=&#39;NO&#39; and MIRROR=&#39;ELECTRIC&#39; and MIRRORCAMERA!=&#39;NO&#39;&quot;,
                         &quot;STEERWHEEL_LOCK=&#39;NO&#39; and MIRROR=&#39;MANUAL&#39; and MIRROR_RIGHT!=&#39;NO&#39;&quot;,
                         &quot;STEERWHEEL_LOCK=&#39;NO&#39; and MIRROR=&#39;MANUAL&#39; and MIRROR_LEFT!=&#39;NO&#39;&quot;,
                         &quot;LENGTH=&#39;122&#39;&quot;,
                         &quot;LENGTH=&#39;135&#39; and BATTERY=&#39;551&#39;&quot;,
                         &quot;LENGTH=&#39;149&#39;&quot;,
                         &quot;LENGTH=&#39;181&#39; and BATTERY=&#39;674&#39;&quot;,
                         &quot;LENGTH=&#39;181&#39; and BATTERY=&#39;551&#39; and CHARGER!=&#39;NO&#39;&quot;
                        ]}
df_result = pd.DataFrame(data=datares)

I've tried splitting the constraints on 'or' and dividing them into arrays and then looping over them to get the result, but with some of the more difficult constraints, you get arrays inside arrays inside arrays and then it gets very messy after a while.
I've also tried making a sort of logic tree, but I haven't gotten that to work in Python yet.

I'm hoping some of you might have a good idea or module to help my with my problem.
Thanks!

答案1

得分: 0

根据您的描述，我会为您提供代码部分的翻译。以下是翻译好的代码部分：

从您的描述中，我认为您需要将表达式转换为“析取范式”（DNF），它看起来像这样：Or(And(v1,v2), And(v1,v4), ...)。下面的代码将使用额外的包来执行此操作（DNF在电子设计自动化中很常见）。要安装该包，请执行 `pip3 install pyeda`。
将表达式拆分为相应的And表达式的代码如下。
注：
* 如果过滤器中匹配正则表达式的部分包含除数字/字母以外的内容（例如问号等），则可能需要进行调整。
* 您没有提供如何否定独立变量的示例。我使用了“not(A)”。
* 如果您没有独立变量（只有比较），则代码会简单得多。
* 如果您有更多的运算符（例如小于等于），则代码会稍微复杂一些。
总体思路：
* 将您的表达式转换为形式为（x & y）| z 的布尔表达式。这是通过将所有比较变为变量来实现的。
* 将布尔表达式转换为DNF。
* 将变量替换回原始比较。
```python
from pyeda.inter import *
import re
import pandas as pd
dataex = {'Component': [123, 
                        456, 
                        789], 
          'Constraint': ["A and (B or C)", 
                         "((MIRROR='ELECTRIC' and MIRRORCAMERA!='NO') or (MIRROR='MANUAL' and (MIRROR_RIGHT!='NO' or MIRROR_LEFT!='NO'))) and STEERWHEEL_LOCK='NO'", 
                         "LENGTH='122' or (LENGTH='135' and BATTERY='551') or LENGTH='149' or (LENGTH='181' and (BATTERY='674' or (BATTERY='551' and CHARGER!='NO')))"]}
df_example = pd.DataFrame(data=dataex)
def transform_expr(input_expression):
    def move_not_before(x):
        if '!=' in x.group(1):
            return '~'+x.group(1).replace("!=","=")
        else:
            return x.group(1)
    expr1 = (re.sub("([a-zA-Z0-9'_]*!=[a-zA-Z0-9'_']*)", move_not_before,input_expression))
    variables = {}
    values = {}
    current_key = 0
    expr2 = ""
    last_idx = 0
    # 进行变换以达到布尔形式
    for idx in re.finditer("([a-zA-Z0-9'_]*=[a-zA-Z0-9'_']*)", expr1):
        expr2 += expr1[last_idx:idx.span(1)[0]]
        if idx[1] in values:
            expr2 += values[idx[1]]
        else:
            expr2 += f'v{current_key}'
            variables[f'v{current_key}']=idx[1]
            values[idx[1]] = f'v{current_key}'
            current_key+=1
        last_idx = idx.span(1)[1]
    expr2 += expr1[last_idx:]
    expr3 = re.sub("and", "&", expr2)
    expr4 = re.sub("or", "|", expr3)
    expr5 = expr(expr4).to_dnf()
    result = []
    # 我们知道expr5的形式类似于Or(And(...),And(...)...)，xs有子节点
    for v in expr5.xs:
        # 我们移除And(...)
        if "," in str(v):
            arr = str(v)[4:-1].replace(" ","").split(",")
        else:
            # 对于只有一个变量的情况
            arr = [str(v)]
        r = []
        for x in arr:
            if x[0]=="~":
                variable_name = x[1:]
            else:
                variable_name = x
            if variable_name not in variables:
                # 如何否定独立变量？
                if x[0]=="~":
                    variable_value = f"not({variable_name})"
                else:
                    variable_value = variable_name
            else:
                variable_value = variables[variable_name]
            if x[0]=="~":
                r.append(variable_value.replace("=","!="))
            else:
                r.append(variable_value)
        result.append(" and ".join(r))
    return result
df_example['Constraint'] = df_example['Constraint'].map(transform_expr)
df_result = df_example.explode('Constraint')
pd.set_option("max_colwidth", None)
print(df_result)

它将打印出：

   Component                                                         Constraint
0        123                                                            A and B
0        123                                                            A and C
1        456  MIRROR='ELECTRIC' and MIRRORCAMERA!='NO' and STEERWHEEL_LOCK='NO'
1        456    MIRROR='MANUAL' and MIRROR_RIGHT!='NO' and STEERWHEEL_LOCK='NO'
1        456     MIRROR='MANUAL' and MIRROR_LEFT!='NO' and STEERWHEEL_LOCK='NO'
2        789                                                       LENGTH='122'
2        789                                                       LENGTH='149'
2        789                                     LENGTH='135' and BATTERY='551'
2        789                                     LENGTH='181' and BATTERY='674'
2        789                   BATTERY='551' and LENGTH='181' and CHARGER!='NO'

希望这有助于您的工作！如果您需要进一步的帮助，请告诉我。

英文:

From your description I think you need to put the expression in "disjunctive normal form" (DNF), which looks like Or(And(v1,v2), And(v1,v4), ...). The code below will do this using an extra package (DNF is common in electronics design automation). To install the package do pip3 install pyeda.

The code that splits an expression into the corresponding And expressions is below.

Notes:

the regular expressions matching the filter might need adjustments if you have more than numbers/letters in the filters (like question mark, etc.)
you have no example on how to negate a standalone variable. I used "not(A)"
if you would not have standalone variables (so only comparisons), the code would be much simpler
if you have more operators (like less than, etc.) the code will slightly more complex

Overall idea:

transform your expression into a boolean expression of the form (x & y) | z . This is done by making all comparisons a variable.
transform the boolean expression into DNF
replace back the variables with the original comparisons.

from pyeda.inter import *
import re
import pandas as pd
dataex = {&#39;Component&#39;: [123, 
456, 
789], 
&#39;Constraint&#39;: [&quot;A and (B or C)&quot;, 
&quot;((MIRROR=&#39;ELECTRIC&#39; and MIRRORCAMERA!=&#39;NO&#39;) or (MIRROR=&#39;MANUAL&#39; and (MIRROR_RIGHT!=&#39;NO&#39; or MIRROR_LEFT!=&#39;NO&#39;))) and STEERWHEEL_LOCK=&#39;NO&#39;&quot;, 
&quot;LENGTH=&#39;122&#39; or (LENGTH=&#39;135&#39; and BATTERY=&#39;551&#39;) or LENGTH=&#39;149&#39; or (LENGTH=&#39;181&#39; and (BATTERY=&#39;674&#39; or (BATTERY=&#39;551&#39; and CHARGER!=&#39;NO&#39;)))&quot;]}
df_example = pd.DataFrame(data=dataex)
def transform_expr(input_expression):
def move_not_before(x):
if &#39;!=&#39; in x.group(1):
return &#39;~&#39;+x.group(1).replace(&quot;!=&quot;,&quot;=&quot;)
else:
return x.group(1)
expr1 = (re.sub(&quot;([a-zA-Z0-9&#39;_]*!=[a-zA-Z0-9&#39;_]*)&quot;, move_not_before,input_expression))
variables = {}
values = {}
current_key = 0
expr2 = &quot;&quot;
last_idx = 0
# Make transformations to reach a boolean form
for idx in re.finditer(&quot;([a-zA-Z0-9&#39;_]*=[a-zA-Z0-9&#39;_]*)&quot;, expr1):
expr2 += expr1[last_idx:idx.span(1)[0]]
if idx[1] in values:
expr2 += values[idx[1]]
else:
expr2 += f&#39;v{current_key}&#39;
variables[f&#39;v{current_key}&#39;]=idx[1]
values[idx[1]] = f&#39;v{current_key}&#39;
current_key+=1
last_idx = idx.span(1)[1]
expr2 += expr1[last_idx:]
expr3 = re.sub(&quot;and&quot;, &quot;&amp;&quot;, expr2)
expr4 = re.sub(&quot;or&quot;, &quot;|&quot;, expr3)
expr5 = expr(expr4).to_dnf()
result = []
# We know expr5 is like Or(And(...),And(...)...), xs has the children
for v in expr5.xs:
# We remove the And(...)
if &quot;,&quot; in str(v):
arr = str(v)[4:-1].replace(&quot; &quot;,&quot;&quot;).split(&quot;,&quot;)
else:
# For cases in which you have only one variable
arr = [str(v)]
r = []
for x in arr:
if x[0]==&quot;~&quot;:
variable_name = x[1:]
else:
variable_name = x
if variable_name not in variables:
# How do we negate a standalone variable?
if x[0]==&quot;~&quot;:
variable_value = f&quot;not({variable_name})&quot;
else:
variable_value = variable_name
else:
variable_value = variables[variable_name]
if x[0]==&quot;~&quot;:
r.append(variable_value.replace(&quot;=&quot;,&quot;!=&quot;))
else:
r.append(variable_value)
result.append(&quot; and &quot;.join(r))
return result
df_example[&#39;Constraint&#39;] = df_example[&#39;Constraint&#39;].map(transform_expr)
df_result = df_example.explode(&#39;Constraint&#39;)
pd.set_option(&quot;max_colwidth&quot;, None)
print(df_result)

And it will print:

   Component                                                         Constraint
0        123                                                            A and B
0        123                                                            A and C
1        456  MIRROR=&#39;ELECTRIC&#39; and MIRRORCAMERA!=&#39;NO&#39; and STEERWHEEL_LOCK=&#39;NO&#39;
1        456    MIRROR=&#39;MANUAL&#39; and MIRROR_RIGHT!=&#39;NO&#39; and STEERWHEEL_LOCK=&#39;NO&#39;
1        456     MIRROR=&#39;MANUAL&#39; and MIRROR_LEFT!=&#39;NO&#39; and STEERWHEEL_LOCK=&#39;NO&#39;
2        789                                                       LENGTH=&#39;122&#39;
2        789                                                       LENGTH=&#39;149&#39;
2        789                                     LENGTH=&#39;135&#39; and BATTERY=&#39;551&#39;
2        789                                     LENGTH=&#39;181&#39; and BATTERY=&#39;674&#39;
2        789                   BATTERY=&#39;551&#39; and LENGTH=&#39;181&#39; and CHARGER!=&#39;NO&#39;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Looking for a way to split strings in pandas dataframe depending on OR, AND and parentheses.

问题

答案1

pandas按组的窗口函数

将JSON文件转换为labels.csv。

为什么这段代码一直运行 else? python3

在Python中如何从外部单独的函数中访问类对象？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。