2023年6月16日 00:09:50go评论135阅读模式

英文:

how to combine logically the data on columns of a pandas DataFrame to generate a new DataFrame?

问题

我已经制作了一个返回“归属表”DataFrame的程序，该表显示了模拟电网的Multigraph边缘。每一行都是负载和源之间的路径，列是连接负载和源的线路的名称。

我编写的程序生成一个类似于下面这个的输出df，但要大得多。

import pandas as pd

belonging = pd.DataFrame({'A': {0: False, 1: False, 2: True, 3: True},
                         'B': {0: True, 1: True, 2: False, 3: False},
                         'C': {0: False, 1: True, 2: False, 3: True},
                         'D': {0: True, 1: False, 2: True, 3: False}})

>>>
       A      B      C      D      E     F
0  False  False   True  False   True  True
1  False  False   True   True  False  True
2   True   True  False  False   True  True
3   True   True  False   True  False  True

现在，我需要生成一个“故障模式”表，其输出应该像这样：

result = pd.DataFrame(
    {'Failure Modes': {0: 'F', 1: 'A // C', 2: 'B // C', 3: 'D // E'},
     'Order of Failure': {0: 1, 1: 2, 2: 2, 3: 2}
    }
)
>>>
  Failure Modes  Order of Failure
0             F                 1
1        A // C                 2
2        B // C                 2
3        D // E                 2

故障模式表是根据列的布尔值构建的，如果某列中的所有项目都为真，那么它是第一级故障。第二级故障尝试检查除已经确定为第一级的列之外的每两列的真值。
以此类推，直到第n级，其中n <= len(belonging.columns)。

在代码中实现起来比我描述起来要简单。非常感谢！

英文:

I've made a programm that return a "belonging table" DataFrame of edges of Multigraph that models a Electrical Grid.
each line is a a path between a load and a source, the columns are the names of the lines that connect the load to the source.

The program I wrote generates an output df that looks like this one, but much larger.

import pandas as pd

belonging = pd.DataFrame({&#39;A&#39;: {0: False, 1: False, 2: True, 3: True},
              &#39;B&#39;: {0: True, 1: True, 2: False, 3: False},
              &#39;C&#39;: {0: False, 1: True, 2: False, 3: True},
              &#39;D&#39;: {0: True, 1: False, 2: True, 3: False}})

&gt;&gt;&gt;
       A      B      C      D      E     F
0  False  False   True  False   True  True
1  False  False   True   True  False  True
2   True   True  False  False   True  True
3   True   True  False   True  False  True

Now I need to generate a "failure modes" table, that give an output that would be like this:

result = pd.DataFrame(
    {&#39;Failure Modes&#39; : {0: &#39;F&#39;, 1: &#39;A // C&#39;, 2: &quot;B // C&quot;, 3: &quot;D // E&quot;},
     &#39;Order of Failure&#39; : {0: 1, 1: 2, 2: 2, 3: 2}
    }
)
&gt;&gt;&gt; 
  Failure Modes  Order of Failure
0             F                 1
1        A // C                 2
2        B // C                 2
3        D // E                 2

The failure Modes table is contructed from the boolean value of the columns, if all the items in a column are true, than that is a first order of faliure. The second order of faliure tries to check for the truth values of every two columns, except those already found to be a first order.
And so on the nth-order. with n <= len(belonging.columns).

Describing it makes it sound simpler to me than I can write in code. Thank you in advance.

答案1

得分: 0

首先，我会识别第一阶段的列，然后使用 [`itertools.combinations`](https://docs.python.org/3/library/itertools.html#itertools.combinations) 和 [`numpy.logical_xor`](https://numpy.org/doc/stable/reference/generated/numpy.logical_xor.html) 测试所有剩余列的配对，最后用 [`pandas.concat`](https://pandas.pydata.org/docs/reference/api/pandas.concat.html) 组合结果：

```python
from itertools import combinations

first = belonging.columns[belonging.all()]

tmp = belonging.drop(columns=first)

out = pd.concat([
    pd.DataFrame({'失效模式': first, '失效顺序': 1}),
    pd.DataFrame({'失效模式': [f'{a}//{b}' for a,b in combinations(tmp, 2)
                                    if np.logical_xor(tmp[a], tmp[b]).all()],
                  '失效顺序': 2})
], ignore_index=True)

注意：示例不够明确，所以如果你不需要独占的 True 值，可以使用 np.logical_or 替代 np.logical_xor。

输出：

  失效模式  失效顺序
0     F     1
1  A//C     2
2  B//C     2
3  D//E     2


<details>
<summary>英文:</summary>

I would first identify the first order columns, then test all pairs of remaining columns with [`itertools.combinations`](https://docs.python.org/3/library/itertools.html#itertools.combinations) and [`numpy.logical_xor`](https://numpy.org/doc/stable/reference/generated/numpy.logical_xor.html), finally combine the results with [`pandas.concat`](https://pandas.pydata.org/docs/reference/api/pandas.concat.html):

from itertools import combinations

first = belonging.columns[belonging.all()]

tmp = belonging.drop(columns=first)

out = pd.concat([
pd.DataFrame({'Failure Modes': first, 'Order of Failure': 1}),
pd.DataFrame({'Failure Modes': [f'{a}//{b}' for a,b in combinations(tmp, 2)
if np.logical_xor(tmp[a], tmp[b]).all()],
'Order of Failure': 2})
], ignore_index=True)

*NB. the example is ambiguous, so if you don&#39;t not need exclusive `True` values, you can use [`np.logical_or`](https://numpy.org/doc/stable/reference/generated/numpy.logical_or.html) in place of `np.logical_xor`.*

Output:

Failure Modes Order of Failure
0 F 1
1 A//C 2
2 B//C 2
3 D//E 2


</details>



# 答案2
**得分**: 0

获取第n阶故障模式，您需要构建每列的幂集，然后评估您的逻辑操作：

```python
import pandas as pd
from itertools import combinations
from numpy import logical_xor

belonging = pd.DataFrame({
    'A': {0: False, 1: False, 2: True, 3: True},
    'B': {0: False, 1: False, 2: True, 3: True},
    'C': {0: True, 1: True, 2: False, 3: False},
    'D': {0: False, 1: True, 2: False, 3: True},
    'E': {0: True, 1: False, 2: True, 3: False},
    'F': {0: True, 1: True, 2: True, 3: True},
})

def powerset(entities):
    for i in range(len(entities)+1):
        yield from combinations(entities, r=i)

failures = {}
for cols in powerset(belonging.columns):
    if len(cols) == 0: continue
    failures['//'.join(cols)] = {
        'failure': logical_xor.reduce(belonging[list(cols)], axis=1).all(),
        'order': len(cols)
    }

test_df = pd.DataFrame.from_dict(failures, orient='index')

print(test_df)

现在，我们已经对幂集中的每个成员进行了测试，我们可以使用简单的过滤操作来定位满足逻辑测试的成员：

print(
    test_df.loc[lambda d: d['failure']]
)

要考虑到您之前遇到的列，可以将for循环更新如下：

from collections import defaultdict

# 与上面的代码相同（数据和powerset函数）
# ...

failures = {}
seen = defaultdict(set)
for cols in powerset(belonging.columns):
    if len(cols) == 0 or seen[len(cols)].issuperset(cols):
        continue

    failures['//'.join(cols)] = {
        'failure': logical_xor.reduce(belonging[list(cols)], axis=1).all(),
        'order': len(cols)
    }

    # 更新当前阶段已看到的数据与前一阶段已看到的数据
    seen[len(cols)] = seen[len(cols)].union(seen[len(cols)-1])

    # 更新当前阶段已看到的数据与表现出故障的列
    if failures['//'.join(cols)]['failure']:
        seen[len(cols)] = seen[len(cols)].union(cols)

    # 一旦将故障归因给所有列，即提前退出
    if len(seen[len(cols)]) == len(belonging.columns):
        break

test_df = pd.DataFrame.from_dict(failures, orient='index')

print(test_df)

print(test_df.loc[lambda d: d['failure']])

英文:

To get the nth-order failure modes, you'll need to construct the powerset of each of your columns and then evaluate your logical operaiton:

import pandas as pd
from itertools import combinations
from numpy import logical_xor

belonging = pd.DataFrame({
    &#39;A&#39;: {0: False, 1: False, 2: True, 3: True},
    &#39;B&#39;: {0: False, 1: False, 2: True, 3: True},
    &#39;C&#39;: {0: True, 1: True, 2: False, 3: False},
    &#39;D&#39;: {0: False, 1: True, 2: False, 3: True},
    &#39;E&#39;: {0: True, 1: False, 2: True, 3: False},
    &#39;F&#39;: {0: True, 1: True, 2: True, 3: True},
})

def powerset(entities):
    for i in range(len(entities)+1):
        yield from combinations(entities, r=i)

failures = {}
for cols in powerset(belonging.columns):
    if len(cols) == 0: continue
    failures[&#39;//&#39;.join(cols)] = {
        &#39;failure&#39;: logical_xor.reduce(belonging[list(cols)], axis=1).all(),
        &#39;order&#39;: len(cols)
    }

test_df = pd.DataFrame.from_dict(failures, orient=&#39;index&#39;)

print(test_df)
                  failure  order
A                   False      1
B                   False      1
C                   False      1
D                   False      1
E                   False      1
...                   ...    ...
A//B//C//E//F       False      5
A//B//D//E//F       False      5
A//C//D//E//F        True      5
B//C//D//E//F        True      5
A//B//C//D//E//F    False      6

[63 rows x 2 columns]

Now that we have evaluated our test on every member of our powerset, we can use a simple filtering operation to locate the members who met our logical test:

print(
    test_df.loc[lambda d: d[&#39;failure&#39;]]
)

               failure  order
F                 True      1
A//C              True      2
B//C              True      2
D//E              True      2
A//B//F           True      3
A//B//D//E        True      4
A//C//D//E//F     True      5
B//C//D//E//F     True      5

To account for columns that you have encountered in a previous order, we can update our for-loop as follows:

from collections import defaultdict

# same as the above code (data &amp; powerset fn)
# ...

failures = {}
seen = defaultdict(set)
for cols in powerset(belonging.columns):
    if len(cols) == 0 or seen[len(cols)].issuperset(cols):
        continue

    failures[&#39;//&#39;.join(cols)] = {
        &#39;failure&#39;: logical_xor.reduce(belonging[list(cols)], axis=1).all(),
        &#39;order&#39;: len(cols)
    }

    # updated current order seen data w/ previous order seen data
    seen[len(cols)] = seen[len(cols)].union(seen[len(cols)-1])

    # update current order seen data w/ columns that exhibit a failure
    if failures[&#39;//&#39;.join(cols)][&#39;failure&#39;]:
        seen[len(cols)] = seen[len(cols)].union(cols)

    # break out early once we have attributed failures to all columns
    if len(seen[len(cols)]) == len(belonging.columns):
        break

test_df = pd.DataFrame.from_dict(failures, orient=&#39;index&#39;)

print(test_df)
      failure  order
A       False      1
B       False      1
C       False      1
D       False      1
E       False      1
F        True      1
A//B    False      2
A//C     True      2
A//D    False      2
A//E    False      2
B//C     True      2
B//D    False      2
B//E    False      2
C//D    False      2
C//E    False      2
D//E     True      2

print(test_df.loc[lambda d: d[&#39;failure&#39;]])
      failure  order
F        True      1
A//C     True      2
B//C     True      2
D//E     True      2

答案3

得分: 0

使用由@Cammeron Riddel和@Mozway提出的解决方案，我得到了以下解决问题的解决方案：

bt = pd.DataFrame({
    'a': [False, False, True, True],
    'b': [False, False, True, True],
    'c': [True, True, False, False],
    'd': [False, True, False, True],
    'e': [True, False, True, False],
    'f': [True, True, True, True],
})
# bt = 属于表

def power_set(entities):
    for i in range(len(entities)+1):
        yield from combinations(entities, r=i)

seen = defaultdict(set)
failures = pd.DataFrame(columns=['Order'])
for columns in power_set(bt.columns):

    order = len(columns)
    last_order = order - 1

    seen[order] = seen[order].union(seen[last_order])

    if len(columns) == 0 or set(columns).intersection(seen[last_order]):
        continue

    if logical_xor.reduce(bt[list(columns)], axis=1).all():
        failures.loc['//'.join([str(edge) for edge in columns])] = order
        seen[order] = seen[order].union(columns)

    if len(seen[order]) == len(bt.columns):
        break

failures

这些更改是因为在包含多达500个不同路径的较大起点和终点对中发现了错误。

英文:

Using the soluting presented by @Cammeron Riddel and @Mozway I got to this solution to the problem:

bt = pd.DataFrame({
    &#39;a&#39;: [False, False, True,True],
    &#39;b&#39;: [False, False, True, True],
    &#39;c&#39;: [True, True, False, False],
    &#39;d&#39;: [False, True, False, True],
    &#39;e&#39;: [True, False, True, False],
    &#39;f&#39;: [True, True, True, True],
    })
# bt = belonging_table

def power_set(entities):
    for i in range(len(entities)+1):
        yield from combinations(entities, r=i)

seen = defaultdict(set)
failures = pd.DataFrame(columns=[&#39;Oder&#39;])
for columns in power_set(bt.columns):

    order = len(columns)
    last_order = order - 1

    seen[order] = seen[order].union(seen[last_order])

    if len(columns) == 0 or set(columns).intersection(seen[last_order]):
        continue

    if logical_xor.reduce(bt[list(columns)], axis=1).all():
        failures.loc[&#39;//&#39;.join([str(edge) for edge in columns])] = order
        seen[order] = seen[order].union(columns)

    if len(seen[order]) == len(bt.columns):
        break

failures

    Order
f	    1
a//c	2
b//c	2
d//e	2

This changes where made because of errors found in larger origin destiny pairs with up too 500 different paths

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在 pandas DataFrame 的列上逻辑组合数据以生成新的 DataFrame？

问题

答案1

答案3

根据另外两列的值更改 Pandas 列。

表情符号计数和分析使用Python pandas

使用Pandas中的`loc`方法忽略列表中的NaN元素。

如何用同一行中的列值替换列表中的列名？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论