英文:
Crosstab across 4 columns and multi-index output
问题
这是你的数据 -
import pandas as pd
a = [[1,0,1,1], [1,1,0,0], [1,1,1,1], [1,1,1,1], [0,0,1,0], [0,1,0,0], [0,0,0,0], [1,0,0,1], [1,0,0,1], [0,1,0,1]]
df = pd.DataFrame(a, columns=['A','B','C','D'])
期望的输出是四列和两个值之间的交叉表 -
iterables = [["A", "B", "C", "D"], [1, 0]]
index = pd.MultiIndex.from_product(iterables)
op = [[0,0,3,3,3,3,0,0], [0,0,2,2,1,3,0,0], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,3,1],
[0,0,0,0,0,0,3,3], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0]]
print(pd.DataFrame(op, index=index, columns=index))
我已经帮你翻译了代码和相关描述部分。
英文:
Here is my data -
import pandas as pd
a = [[1,0,1,1], [1,1,0,0], [1,1,1,1], [1,1,1,1], [0,0,1,0], [0,1,0,0], [0,0,0,0], [1,0,0,1], [1,0,0,1], [0,1,0,1]]
df = pd.DataFrame(a, columns=['A','B','C','D'])
A B C D
0 1 0 1 1
1 1 1 0 0
2 1 1 1 1
3 1 1 1 1
4 0 0 1 0
5 0 1 0 0
6 0 0 0 0
7 1 0 0 1
8 1 0 0 1
9 0 1 0 1
The desired output is a cross tab of counts between different combinations of the 4 columns and two values -
iterables = [["A", "B", "C", "D"], [1, 0]]
index = pd.MultiIndex.from_product(iterables)
op = [[0,0,3,3,3,3,0,0], [0,0,2,2,1,3,0,0], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,3,1],
[0,0,0,0,0,0,3,3], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0]]
print(pd.DataFrame(op, index=index, columns=index))
A B C D
1 0 1 0 1 0 1 0
A 1 0 0 3 3 3 3 0 0
0 0 0 2 2 1 3 0 0
B 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 3 1
C 1 0 0 0 0 0 0 3 3
0 0 0 0 0 0 0 0 0
D 1 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
I have tried pd.crosstab()
and only seems to support two columns, not sure. Also tried pivot tables without luck. Please help.
答案1
得分: 1
你可以计算每个索引中每个值的出现次数,然后使用矩阵乘法:
tmp = (df.rename_axis(columns='c', index='i')
.stack().to_frame(name='v')
.reset_index()
.groupby(['c','v'])['i'].value_counts()
.unstack('i', fill_value=0)
)
tmp @ tmp.T
输出:
c A B C D
v 0 1 0 1 0 1 0 1
c v
A 0 4 0 2 2 3 1 3 1
1 0 6 3 3 3 3 1 5
B 0 2 3 5 0 3 2 2 3
1 2 3 0 5 3 2 2 3
C 0 3 3 3 3 6 0 3 3
1 1 3 2 2 0 4 1 3
D 0 3 1 2 2 3 1 4 0
1 1 5 3 3 3 3 0 6
英文:
You can count the occurrences of each values in each index, then use matrix multiplication:
tmp = (df.rename_axis(columns='c', index='i')
.stack().to_frame(name='v')
.reset_index()
.groupby(['c','v'])['i'].value_counts()
.unstack('i', fill_value=0)
)
tmp @ tmp.T
Output:
c A B C D
v 0 1 0 1 0 1 0 1
c v
A 0 4 0 2 2 3 1 3 1
1 0 6 3 3 3 3 1 5
B 0 2 3 5 0 3 2 2 3
1 2 3 0 5 3 2 2 3
C 0 3 3 3 3 6 0 3 3
1 1 3 2 2 0 4 1 3
D 0 3 1 2 2 3 1 4 0
1 1 5 3 3 3 3 0 6
答案2
得分: 0
你可以先计算一个长的DataFrame,然后使用crosstab
函数:
from itertools import product
tmp = pd.concat([pd.DataFrame({'col': a,
'idx': b,
'valc': df[a],
'vali': df[b]
})
for a, b in product(df, repeat=2)]
)
out = pd.crosstab([tmp['idx'], tmp['vali']], [tmp['col'], tmp['valc']])
注:由于输出是对称的,你也可以计算组合而不是乘积。
英文:
You can first compute a long DataFrame, then the crosstab
:
from itertools import product
tmp = pd.concat([pd.DataFrame({'col': a,
'idx': b,
'valc': df[a],
'vali': df[b]
})
for a, b in product(df, repeat=2)]
)
out = pd.crosstab([tmp['idx'], tmp['vali']], [tmp['col'], tmp['valc']])
NB. as the output is symmetric, you can also compute the combinations instead of the product.
Output:
col A B C D
valc 0 1 0 1 0 1 0 1
idx vali
A 0 4 0 2 2 3 1 3 1
1 0 6 3 3 3 3 1 5
B 0 2 3 5 0 3 2 2 3
1 2 3 0 5 3 2 2 3
C 0 3 3 3 3 6 0 3 3
1 1 3 2 2 0 4 1 3
D 0 3 1 2 2 3 1 4 0
1 1 5 3 3 3 3 0 6
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论