英文:
how to combine logically the data on columns of a pandas DataFrame to generate a new DataFrame?
问题
我已经制作了一个返回“归属表”DataFrame的程序,该表显示了模拟电网的Multigraph边缘。每一行都是负载和源之间的路径,列是连接负载和源的线路的名称。
我编写的程序生成一个类似于下面这个的输出df,但要大得多。
import pandas as pd
belonging = pd.DataFrame({'A': {0: False, 1: False, 2: True, 3: True},
'B': {0: True, 1: True, 2: False, 3: False},
'C': {0: False, 1: True, 2: False, 3: True},
'D': {0: True, 1: False, 2: True, 3: False}})
>>>
A B C D E F
0 False False True False True True
1 False False True True False True
2 True True False False True True
3 True True False True False True
现在,我需要生成一个“故障模式”表,其输出应该像这样:
result = pd.DataFrame(
{'Failure Modes': {0: 'F', 1: 'A // C', 2: 'B // C', 3: 'D // E'},
'Order of Failure': {0: 1, 1: 2, 2: 2, 3: 2}
}
)
>>>
Failure Modes Order of Failure
0 F 1
1 A // C 2
2 B // C 2
3 D // E 2
故障模式表是根据列的布尔值构建的,如果某列中的所有项目都为真,那么它是第一级故障。第二级故障尝试检查除已经确定为第一级的列之外的每两列的真值。
以此类推,直到第n级,其中n <= len(belonging.columns)。
在代码中实现起来比我描述起来要简单。非常感谢!
英文:
I've made a programm that return a "belonging table" DataFrame of edges of Multigraph that models a Electrical Grid.
each line is a a path between a load and a source, the columns are the names of the lines that connect the load to the source.
The program I wrote generates an output df that looks like this one, but much larger.
import pandas as pd
belonging = pd.DataFrame({'A': {0: False, 1: False, 2: True, 3: True},
'B': {0: True, 1: True, 2: False, 3: False},
'C': {0: False, 1: True, 2: False, 3: True},
'D': {0: True, 1: False, 2: True, 3: False}})
>>>
A B C D E F
0 False False True False True True
1 False False True True False True
2 True True False False True True
3 True True False True False True
Now I need to generate a "failure modes" table, that give an output that would be like this:
result = pd.DataFrame(
{'Failure Modes' : {0: 'F', 1: 'A // C', 2: "B // C", 3: "D // E"},
'Order of Failure' : {0: 1, 1: 2, 2: 2, 3: 2}
}
)
>>>
Failure Modes Order of Failure
0 F 1
1 A // C 2
2 B // C 2
3 D // E 2
The failure Modes table is contructed from the boolean value of the columns, if all the items in a column are true, than that is a first order of faliure. The second order of faliure tries to check for the truth values of every two columns, except those already found to be a first order.
And so on the nth-order. with n <= len(belonging.columns)
.
Describing it makes it sound simpler to me than I can write in code. Thank you in advance.
答案1
得分: 0
首先,我会识别第一阶段的列,然后使用 [`itertools.combinations`](https://docs.python.org/3/library/itertools.html#itertools.combinations) 和 [`numpy.logical_xor`](https://numpy.org/doc/stable/reference/generated/numpy.logical_xor.html) 测试所有剩余列的配对,最后用 [`pandas.concat`](https://pandas.pydata.org/docs/reference/api/pandas.concat.html) 组合结果:
```python
from itertools import combinations
first = belonging.columns[belonging.all()]
tmp = belonging.drop(columns=first)
out = pd.concat([
pd.DataFrame({'失效模式': first, '失效顺序': 1}),
pd.DataFrame({'失效模式': [f'{a}//{b}' for a,b in combinations(tmp, 2)
if np.logical_xor(tmp[a], tmp[b]).all()],
'失效顺序': 2})
], ignore_index=True)
注意:示例不够明确,所以如果你不需要独占的 True
值,可以使用 np.logical_or
替代 np.logical_xor
。
输出:
失效模式 失效顺序
0 F 1
1 A//C 2
2 B//C 2
3 D//E 2
<details>
<summary>英文:</summary>
I would first identify the first order columns, then test all pairs of remaining columns with [`itertools.combinations`](https://docs.python.org/3/library/itertools.html#itertools.combinations) and [`numpy.logical_xor`](https://numpy.org/doc/stable/reference/generated/numpy.logical_xor.html), finally combine the results with [`pandas.concat`](https://pandas.pydata.org/docs/reference/api/pandas.concat.html):
from itertools import combinations
first = belonging.columns[belonging.all()]
tmp = belonging.drop(columns=first)
out = pd.concat([
pd.DataFrame({'Failure Modes': first, 'Order of Failure': 1}),
pd.DataFrame({'Failure Modes': [f'{a}//{b}' for a,b in combinations(tmp, 2)
if np.logical_xor(tmp[a], tmp[b]).all()],
'Order of Failure': 2})
], ignore_index=True)
*NB. the example is ambiguous, so if you don't not need exclusive `True` values, you can use [`np.logical_or`](https://numpy.org/doc/stable/reference/generated/numpy.logical_or.html) in place of `np.logical_xor`.*
Output:
Failure Modes Order of Failure
0 F 1
1 A//C 2
2 B//C 2
3 D//E 2
</details>
# 答案2
**得分**: 0
获取第n阶故障模式,您需要构建每列的幂集,然后评估您的逻辑操作:
```python
import pandas as pd
from itertools import combinations
from numpy import logical_xor
belonging = pd.DataFrame({
'A': {0: False, 1: False, 2: True, 3: True},
'B': {0: False, 1: False, 2: True, 3: True},
'C': {0: True, 1: True, 2: False, 3: False},
'D': {0: False, 1: True, 2: False, 3: True},
'E': {0: True, 1: False, 2: True, 3: False},
'F': {0: True, 1: True, 2: True, 3: True},
})
def powerset(entities):
for i in range(len(entities)+1):
yield from combinations(entities, r=i)
failures = {}
for cols in powerset(belonging.columns):
if len(cols) == 0: continue
failures['//'.join(cols)] = {
'failure': logical_xor.reduce(belonging[list(cols)], axis=1).all(),
'order': len(cols)
}
test_df = pd.DataFrame.from_dict(failures, orient='index')
print(test_df)
现在,我们已经对幂集中的每个成员进行了测试,我们可以使用简单的过滤操作来定位满足逻辑测试的成员:
print(
test_df.loc[lambda d: d['failure']]
)
要考虑到您之前遇到的列,可以将for循环更新如下:
from collections import defaultdict
# 与上面的代码相同(数据和powerset函数)
# ...
failures = {}
seen = defaultdict(set)
for cols in powerset(belonging.columns):
if len(cols) == 0 or seen[len(cols)].issuperset(cols):
continue
failures['//'.join(cols)] = {
'failure': logical_xor.reduce(belonging[list(cols)], axis=1).all(),
'order': len(cols)
}
# 更新当前阶段已看到的数据与前一阶段已看到的数据
seen[len(cols)] = seen[len(cols)].union(seen[len(cols)-1])
# 更新当前阶段已看到的数据与表现出故障的列
if failures['//'.join(cols)]['failure']:
seen[len(cols)] = seen[len(cols)].union(cols)
# 一旦将故障归因给所有列,即提前退出
if len(seen[len(cols)]) == len(belonging.columns):
break
test_df = pd.DataFrame.from_dict(failures, orient='index')
print(test_df)
print(test_df.loc[lambda d: d['failure']])
英文:
To get the nth-order failure modes, you'll need to construct the powerset
of each of your columns and then evaluate your logical operaiton:
import pandas as pd
from itertools import combinations
from numpy import logical_xor
belonging = pd.DataFrame({
'A': {0: False, 1: False, 2: True, 3: True},
'B': {0: False, 1: False, 2: True, 3: True},
'C': {0: True, 1: True, 2: False, 3: False},
'D': {0: False, 1: True, 2: False, 3: True},
'E': {0: True, 1: False, 2: True, 3: False},
'F': {0: True, 1: True, 2: True, 3: True},
})
def powerset(entities):
for i in range(len(entities)+1):
yield from combinations(entities, r=i)
failures = {}
for cols in powerset(belonging.columns):
if len(cols) == 0: continue
failures['//'.join(cols)] = {
'failure': logical_xor.reduce(belonging[list(cols)], axis=1).all(),
'order': len(cols)
}
test_df = pd.DataFrame.from_dict(failures, orient='index')
print(test_df)
failure order
A False 1
B False 1
C False 1
D False 1
E False 1
... ... ...
A//B//C//E//F False 5
A//B//D//E//F False 5
A//C//D//E//F True 5
B//C//D//E//F True 5
A//B//C//D//E//F False 6
[63 rows x 2 columns]
Now that we have evaluated our test on every member of our powerset, we can use a simple filtering operation to locate the members who met our logical test:
print(
test_df.loc[lambda d: d['failure']]
)
failure order
F True 1
A//C True 2
B//C True 2
D//E True 2
A//B//F True 3
A//B//D//E True 4
A//C//D//E//F True 5
B//C//D//E//F True 5
To account for columns that you have encountered in a previous order, we can update our for-loop as follows:
from collections import defaultdict
# same as the above code (data & powerset fn)
# ...
failures = {}
seen = defaultdict(set)
for cols in powerset(belonging.columns):
if len(cols) == 0 or seen[len(cols)].issuperset(cols):
continue
failures['//'.join(cols)] = {
'failure': logical_xor.reduce(belonging[list(cols)], axis=1).all(),
'order': len(cols)
}
# updated current order seen data w/ previous order seen data
seen[len(cols)] = seen[len(cols)].union(seen[len(cols)-1])
# update current order seen data w/ columns that exhibit a failure
if failures['//'.join(cols)]['failure']:
seen[len(cols)] = seen[len(cols)].union(cols)
# break out early once we have attributed failures to all columns
if len(seen[len(cols)]) == len(belonging.columns):
break
test_df = pd.DataFrame.from_dict(failures, orient='index')
print(test_df)
failure order
A False 1
B False 1
C False 1
D False 1
E False 1
F True 1
A//B False 2
A//C True 2
A//D False 2
A//E False 2
B//C True 2
B//D False 2
B//E False 2
C//D False 2
C//E False 2
D//E True 2
print(test_df.loc[lambda d: d['failure']])
failure order
F True 1
A//C True 2
B//C True 2
D//E True 2
答案3
得分: 0
使用由@Cammeron Riddel和@Mozway提出的解决方案,我得到了以下解决问题的解决方案:
bt = pd.DataFrame({
'a': [False, False, True, True],
'b': [False, False, True, True],
'c': [True, True, False, False],
'd': [False, True, False, True],
'e': [True, False, True, False],
'f': [True, True, True, True],
})
# bt = 属于表
def power_set(entities):
for i in range(len(entities)+1):
yield from combinations(entities, r=i)
seen = defaultdict(set)
failures = pd.DataFrame(columns=['Order'])
for columns in power_set(bt.columns):
order = len(columns)
last_order = order - 1
seen[order] = seen[order].union(seen[last_order])
if len(columns) == 0 or set(columns).intersection(seen[last_order]):
continue
if logical_xor.reduce(bt[list(columns)], axis=1).all():
failures.loc['//'.join([str(edge) for edge in columns])] = order
seen[order] = seen[order].union(columns)
if len(seen[order]) == len(bt.columns):
break
failures
这些更改是因为在包含多达500个不同路径的较大起点和终点对中发现了错误。
英文:
Using the soluting presented by @Cammeron Riddel and @Mozway I got to this solution to the problem:
bt = pd.DataFrame({
'a': [False, False, True,True],
'b': [False, False, True, True],
'c': [True, True, False, False],
'd': [False, True, False, True],
'e': [True, False, True, False],
'f': [True, True, True, True],
})
# bt = belonging_table
def power_set(entities):
for i in range(len(entities)+1):
yield from combinations(entities, r=i)
seen = defaultdict(set)
failures = pd.DataFrame(columns=['Oder'])
for columns in power_set(bt.columns):
order = len(columns)
last_order = order - 1
seen[order] = seen[order].union(seen[last_order])
if len(columns) == 0 or set(columns).intersection(seen[last_order]):
continue
if logical_xor.reduce(bt[list(columns)], axis=1).all():
failures.loc['//'.join([str(edge) for edge in columns])] = order
seen[order] = seen[order].union(columns)
if len(seen[order]) == len(bt.columns):
break
failures
Order
f 1
a//c 2
b//c 2
d//e 2
This changes where made because of errors found in larger origin destiny pairs with up too 500 different paths
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论