如何在 pandas DataFrame 的列上逻辑组合数据以生成新的 DataFrame?

huangapple go评论135阅读模式
英文:

how to combine logically the data on columns of a pandas DataFrame to generate a new DataFrame?

问题

我已经制作了一个返回“归属表”DataFrame的程序,该表显示了模拟电网的Multigraph边缘。每一行都是负载和源之间的路径,列是连接负载和源的线路的名称。

我编写的程序生成一个类似于下面这个的输出df,但要大得多。

import pandas as pd

belonging = pd.DataFrame({'A': {0: False, 1: False, 2: True, 3: True},
                         'B': {0: True, 1: True, 2: False, 3: False},
                         'C': {0: False, 1: True, 2: False, 3: True},
                         'D': {0: True, 1: False, 2: True, 3: False}})

>>>
       A      B      C      D      E     F
0  False  False   True  False   True  True
1  False  False   True   True  False  True
2   True   True  False  False   True  True
3   True   True  False   True  False  True

现在,我需要生成一个“故障模式”表,其输出应该像这样:

result = pd.DataFrame(
    {'Failure Modes': {0: 'F', 1: 'A // C', 2: 'B // C', 3: 'D // E'},
     'Order of Failure': {0: 1, 1: 2, 2: 2, 3: 2}
    }
)
>>>
  Failure Modes  Order of Failure
0             F                 1
1        A // C                 2
2        B // C                 2
3        D // E                 2

故障模式表是根据列的布尔值构建的,如果某列中的所有项目都为真,那么它是第一级故障。第二级故障尝试检查除已经确定为第一级的列之外的每两列的真值。
以此类推,直到第n级,其中n <= len(belonging.columns)。

在代码中实现起来比我描述起来要简单。非常感谢!

英文:

I've made a programm that return a "belonging table" DataFrame of edges of Multigraph that models a Electrical Grid.
each line is a a path between a load and a source, the columns are the names of the lines that connect the load to the source.

The program I wrote generates an output df that looks like this one, but much larger.

import pandas as pd

belonging = pd.DataFrame({&#39;A&#39;: {0: False, 1: False, 2: True, 3: True},
              &#39;B&#39;: {0: True, 1: True, 2: False, 3: False},
              &#39;C&#39;: {0: False, 1: True, 2: False, 3: True},
              &#39;D&#39;: {0: True, 1: False, 2: True, 3: False}})

&gt;&gt;&gt;
       A      B      C      D      E     F
0  False  False   True  False   True  True
1  False  False   True   True  False  True
2   True   True  False  False   True  True
3   True   True  False   True  False  True

Now I need to generate a "failure modes" table, that give an output that would be like this:

result = pd.DataFrame(
    {&#39;Failure Modes&#39; : {0: &#39;F&#39;, 1: &#39;A // C&#39;, 2: &quot;B // C&quot;, 3: &quot;D // E&quot;},
     &#39;Order of Failure&#39; : {0: 1, 1: 2, 2: 2, 3: 2}
    }
)
&gt;&gt;&gt; 
  Failure Modes  Order of Failure
0             F                 1
1        A // C                 2
2        B // C                 2
3        D // E                 2

The failure Modes table is contructed from the boolean value of the columns, if all the items in a column are true, than that is a first order of faliure. The second order of faliure tries to check for the truth values of every two columns, except those already found to be a first order.
And so on the nth-order. with n &lt;= len(belonging.columns).

Describing it makes it sound simpler to me than I can write in code. Thank you in advance.

答案1

得分: 0

首先我会识别第一阶段的列然后使用 [`itertools.combinations`](https://docs.python.org/3/library/itertools.html#itertools.combinations) 和 [`numpy.logical_xor`](https://numpy.org/doc/stable/reference/generated/numpy.logical_xor.html) 测试所有剩余列的配对,最后用 [`pandas.concat`](https://pandas.pydata.org/docs/reference/api/pandas.concat.html) 组合结果:

```python
from itertools import combinations

first = belonging.columns[belonging.all()]

tmp = belonging.drop(columns=first)

out = pd.concat([
    pd.DataFrame({'失效模式': first, '失效顺序': 1}),
    pd.DataFrame({'失效模式': [f'{a}//{b}' for a,b in combinations(tmp, 2)
                                    if np.logical_xor(tmp[a], tmp[b]).all()],
                  '失效顺序': 2})
], ignore_index=True)

注意:示例不够明确,所以如果你不需要独占的 True 值,可以使用 np.logical_or 替代 np.logical_xor

输出:

  失效模式  失效顺序
0     F     1
1  A//C     2
2  B//C     2
3  D//E     2

<details>
<summary>英文:</summary>

I would first identify the first order columns, then test all pairs of remaining columns with [`itertools.combinations`](https://docs.python.org/3/library/itertools.html#itertools.combinations) and [`numpy.logical_xor`](https://numpy.org/doc/stable/reference/generated/numpy.logical_xor.html), finally combine the results with [`pandas.concat`](https://pandas.pydata.org/docs/reference/api/pandas.concat.html):

from itertools import combinations

first = belonging.columns[belonging.all()]

tmp = belonging.drop(columns=first)

out = pd.concat([
pd.DataFrame({'Failure Modes': first, 'Order of Failure': 1}),
pd.DataFrame({'Failure Modes': [f'{a}//{b}' for a,b in combinations(tmp, 2)
if np.logical_xor(tmp[a], tmp[b]).all()],
'Order of Failure': 2})
], ignore_index=True)

*NB. the example is ambiguous, so if you don&#39;t not need exclusive `True` values, you can use [`np.logical_or`](https://numpy.org/doc/stable/reference/generated/numpy.logical_or.html) in place of `np.logical_xor`.*

Output:

Failure Modes Order of Failure
0 F 1
1 A//C 2
2 B//C 2
3 D//E 2


</details>



# 答案2
**得分**: 0

获取第n阶故障模式,您需要构建每列的幂集,然后评估您的逻辑操作:

```python
import pandas as pd
from itertools import combinations
from numpy import logical_xor

belonging = pd.DataFrame({
    'A': {0: False, 1: False, 2: True, 3: True},
    'B': {0: False, 1: False, 2: True, 3: True},
    'C': {0: True, 1: True, 2: False, 3: False},
    'D': {0: False, 1: True, 2: False, 3: True},
    'E': {0: True, 1: False, 2: True, 3: False},
    'F': {0: True, 1: True, 2: True, 3: True},
})

def powerset(entities):
    for i in range(len(entities)+1):
        yield from combinations(entities, r=i)

failures = {}
for cols in powerset(belonging.columns):
    if len(cols) == 0: continue
    failures['//'.join(cols)] = {
        'failure': logical_xor.reduce(belonging[list(cols)], axis=1).all(),
        'order': len(cols)
    }

test_df = pd.DataFrame.from_dict(failures, orient='index')

print(test_df)

现在,我们已经对幂集中的每个成员进行了测试,我们可以使用简单的过滤操作来定位满足逻辑测试的成员:

print(
    test_df.loc[lambda d: d['failure']]
)

要考虑到您之前遇到的列,可以将for循环更新如下:

from collections import defaultdict

# 与上面的代码相同(数据和powerset函数)
# ...

failures = {}
seen = defaultdict(set)
for cols in powerset(belonging.columns):
    if len(cols) == 0 or seen[len(cols)].issuperset(cols):
        continue

    failures['//'.join(cols)] = {
        'failure': logical_xor.reduce(belonging[list(cols)], axis=1).all(),
        'order': len(cols)
    }

    # 更新当前阶段已看到的数据与前一阶段已看到的数据
    seen[len(cols)] = seen[len(cols)].union(seen[len(cols)-1])

    # 更新当前阶段已看到的数据与表现出故障的列
    if failures['//'.join(cols)]['failure']:
        seen[len(cols)] = seen[len(cols)].union(cols)

    # 一旦将故障归因给所有列,即提前退出
    if len(seen[len(cols)]) == len(belonging.columns):
        break

test_df = pd.DataFrame.from_dict(failures, orient='index')

print(test_df)

print(test_df.loc[lambda d: d['failure']])
英文:

To get the nth-order failure modes, you'll need to construct the powerset of each of your columns and then evaluate your logical operaiton:

import pandas as pd
from itertools import combinations
from numpy import logical_xor

belonging = pd.DataFrame({
    &#39;A&#39;: {0: False, 1: False, 2: True, 3: True},
    &#39;B&#39;: {0: False, 1: False, 2: True, 3: True},
    &#39;C&#39;: {0: True, 1: True, 2: False, 3: False},
    &#39;D&#39;: {0: False, 1: True, 2: False, 3: True},
    &#39;E&#39;: {0: True, 1: False, 2: True, 3: False},
    &#39;F&#39;: {0: True, 1: True, 2: True, 3: True},
})

def powerset(entities):
    for i in range(len(entities)+1):
        yield from combinations(entities, r=i)

failures = {}
for cols in powerset(belonging.columns):
    if len(cols) == 0: continue
    failures[&#39;//&#39;.join(cols)] = {
        &#39;failure&#39;: logical_xor.reduce(belonging[list(cols)], axis=1).all(),
        &#39;order&#39;: len(cols)
    }

test_df = pd.DataFrame.from_dict(failures, orient=&#39;index&#39;)

print(test_df)
                  failure  order
A                   False      1
B                   False      1
C                   False      1
D                   False      1
E                   False      1
...                   ...    ...
A//B//C//E//F       False      5
A//B//D//E//F       False      5
A//C//D//E//F        True      5
B//C//D//E//F        True      5
A//B//C//D//E//F    False      6

[63 rows x 2 columns]

Now that we have evaluated our test on every member of our powerset, we can use a simple filtering operation to locate the members who met our logical test:

print(
    test_df.loc[lambda d: d[&#39;failure&#39;]]
)

               failure  order
F                 True      1
A//C              True      2
B//C              True      2
D//E              True      2
A//B//F           True      3
A//B//D//E        True      4
A//C//D//E//F     True      5
B//C//D//E//F     True      5

To account for columns that you have encountered in a previous order, we can update our for-loop as follows:

from collections import defaultdict

# same as the above code (data &amp; powerset fn)
# ...

failures = {}
seen = defaultdict(set)
for cols in powerset(belonging.columns):
    if len(cols) == 0 or seen[len(cols)].issuperset(cols):
        continue

    failures[&#39;//&#39;.join(cols)] = {
        &#39;failure&#39;: logical_xor.reduce(belonging[list(cols)], axis=1).all(),
        &#39;order&#39;: len(cols)
    }

    # updated current order seen data w/ previous order seen data
    seen[len(cols)] = seen[len(cols)].union(seen[len(cols)-1])

    # update current order seen data w/ columns that exhibit a failure
    if failures[&#39;//&#39;.join(cols)][&#39;failure&#39;]:
        seen[len(cols)] = seen[len(cols)].union(cols)

    # break out early once we have attributed failures to all columns
    if len(seen[len(cols)]) == len(belonging.columns):
        break

test_df = pd.DataFrame.from_dict(failures, orient=&#39;index&#39;)

print(test_df)
      failure  order
A       False      1
B       False      1
C       False      1
D       False      1
E       False      1
F        True      1
A//B    False      2
A//C     True      2
A//D    False      2
A//E    False      2
B//C     True      2
B//D    False      2
B//E    False      2
C//D    False      2
C//E    False      2
D//E     True      2

print(test_df.loc[lambda d: d[&#39;failure&#39;]])
      failure  order
F        True      1
A//C     True      2
B//C     True      2
D//E     True      2

答案3

得分: 0

使用由@Cammeron Riddel和@Mozway提出的解决方案,我得到了以下解决问题的解决方案:

bt = pd.DataFrame({
    'a': [False, False, True, True],
    'b': [False, False, True, True],
    'c': [True, True, False, False],
    'd': [False, True, False, True],
    'e': [True, False, True, False],
    'f': [True, True, True, True],
})
# bt = 属于表

def power_set(entities):
    for i in range(len(entities)+1):
        yield from combinations(entities, r=i)

seen = defaultdict(set)
failures = pd.DataFrame(columns=['Order'])
for columns in power_set(bt.columns):

    order = len(columns)
    last_order = order - 1

    seen[order] = seen[order].union(seen[last_order])

    if len(columns) == 0 or set(columns).intersection(seen[last_order]):
        continue

    if logical_xor.reduce(bt[list(columns)], axis=1).all():
        failures.loc['//'.join([str(edge) for edge in columns])] = order
        seen[order] = seen[order].union(columns)

    if len(seen[order]) == len(bt.columns):
        break

failures

这些更改是因为在包含多达500个不同路径的较大起点和终点对中发现了错误。

英文:

Using the soluting presented by @Cammeron Riddel and @Mozway I got to this solution to the problem:

bt = pd.DataFrame({
    &#39;a&#39;: [False, False, True,True],
    &#39;b&#39;: [False, False, True, True],
    &#39;c&#39;: [True, True, False, False],
    &#39;d&#39;: [False, True, False, True],
    &#39;e&#39;: [True, False, True, False],
    &#39;f&#39;: [True, True, True, True],
    })
# bt = belonging_table

def power_set(entities):
    for i in range(len(entities)+1):
        yield from combinations(entities, r=i)

seen = defaultdict(set)
failures = pd.DataFrame(columns=[&#39;Oder&#39;])
for columns in power_set(bt.columns):

    order = len(columns)
    last_order = order - 1

    seen[order] = seen[order].union(seen[last_order])

    if len(columns) == 0 or set(columns).intersection(seen[last_order]):
        continue

    if logical_xor.reduce(bt[list(columns)], axis=1).all():
        failures.loc[&#39;//&#39;.join([str(edge) for edge in columns])] = order
        seen[order] = seen[order].union(columns)

    if len(seen[order]) == len(bt.columns):
        break

failures

    Order
f	    1
a//c	2
b//c	2
d//e	2

This changes where made because of errors found in larger origin destiny pairs with up too 500 different paths

huangapple
  • 本文由 发表于 2023年6月16日 00:09:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76483571.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定