在四列之间制作交叉表,并生成多重索引输出。

huangapple go评论61阅读模式
英文:

Crosstab across 4 columns and multi-index output

问题

这是你的数据 -

import pandas as pd
a = [[1,0,1,1], [1,1,0,0], [1,1,1,1], [1,1,1,1], [0,0,1,0], [0,1,0,0], [0,0,0,0], [1,0,0,1], [1,0,0,1], [0,1,0,1]]
df = pd.DataFrame(a, columns=['A','B','C','D'])

期望的输出是四列和两个值之间的交叉表 -

iterables = [["A", "B", "C", "D"], [1, 0]]
index = pd.MultiIndex.from_product(iterables)

op = [[0,0,3,3,3,3,0,0], [0,0,2,2,1,3,0,0], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,3,1], 
      [0,0,0,0,0,0,3,3], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0]]

print(pd.DataFrame(op, index=index, columns=index))

我已经帮你翻译了代码和相关描述部分。

英文:

Here is my data -

import pandas as pd
a = [[1,0,1,1], [1,1,0,0], [1,1,1,1], [1,1,1,1], [0,0,1,0], [0,1,0,0], [0,0,0,0], [1,0,0,1], [1,0,0,1], [0,1,0,1]]
df = pd.DataFrame(a, columns=['A','B','C','D'])

   A  B  C  D
0  1  0  1  1
1  1  1  0  0
2  1  1  1  1
3  1  1  1  1
4  0  0  1  0
5  0  1  0  0
6  0  0  0  0
7  1  0  0  1
8  1  0  0  1
9  0  1  0  1

The desired output is a cross tab of counts between different combinations of the 4 columns and two values -

iterables = [["A", "B", "C", "D"], [1, 0]]
index = pd.MultiIndex.from_product(iterables)

op = [[0,0,3,3,3,3,0,0], [0,0,2,2,1,3,0,0], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,3,1], 
      [0,0,0,0,0,0,3,3], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0]]

print(pd.DataFrame(op, index=index, columns=index))

     A     B     C     D   
     1  0  1  0  1  0  1  0
A 1  0  0  3  3  3  3  0  0
  0  0  0  2  2  1  3  0  0
B 1  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  3  1
C 1  0  0  0  0  0  0  3  3
  0  0  0  0  0  0  0  0  0
D 1  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0

I have tried pd.crosstab() and only seems to support two columns, not sure. Also tried pivot tables without luck. Please help.

答案1

得分: 1

你可以计算每个索引中每个值的出现次数,然后使用矩阵乘法:

tmp = (df.rename_axis(columns='c', index='i')
         .stack().to_frame(name='v')
         .reset_index()
         .groupby(['c','v'])['i'].value_counts()
         .unstack('i', fill_value=0)
      )

tmp @ tmp.T

输出:

c    A     B     C     D   
v    0  1  0  1  0  1  0  1
c v                        
A 0  4  0  2  2  3  1  3  1
  1  0  6  3  3  3  3  1  5
B 0  2  3  5  0  3  2  2  3
  1  2  3  0  5  3  2  2  3
C 0  3  3  3  3  6  0  3  3
  1  1  3  2  2  0  4  1  3
D 0  3  1  2  2  3  1  4  0
  1  1  5  3  3  3  3  0  6
英文:

You can count the occurrences of each values in each index, then use matrix multiplication:

tmp = (df.rename_axis(columns='c', index='i')
         .stack().to_frame(name='v')
         .reset_index()
         .groupby(['c','v'])['i'].value_counts()
         .unstack('i', fill_value=0)
      )

tmp @ tmp.T

Output:

c    A     B     C     D   
v    0  1  0  1  0  1  0  1
c v                        
A 0  4  0  2  2  3  1  3  1
  1  0  6  3  3  3  3  1  5
B 0  2  3  5  0  3  2  2  3
  1  2  3  0  5  3  2  2  3
C 0  3  3  3  3  6  0  3  3
  1  1  3  2  2  0  4  1  3
D 0  3  1  2  2  3  1  4  0
  1  1  5  3  3  3  3  0  6

答案2

得分: 0

你可以先计算一个长的DataFrame,然后使用crosstab函数:

from itertools import product

tmp = pd.concat([pd.DataFrame({'col': a,
                               'idx': b,
                               'valc': df[a],
                               'vali': df[b]
                              })
                 for a, b in product(df, repeat=2)]
                )

out = pd.crosstab([tmp['idx'], tmp['vali']], [tmp['col'], tmp['valc']])

注:由于输出是对称的,你也可以计算组合而不是乘积。

英文:

You can first compute a long DataFrame, then the crosstab:


from itertools import product

tmp = pd.concat([pd.DataFrame({'col': a,
                               'idx': b,
                               'valc': df[a],
                               'vali': df[b]
                              })
                 for a, b in product(df, repeat=2)]
                )

out = pd.crosstab([tmp['idx'], tmp['vali']], [tmp['col'], tmp['valc']])

NB. as the output is symmetric, you can also compute the combinations instead of the product.

Output:

col       A     B     C     D   
valc      0  1  0  1  0  1  0  1
idx vali                        
A   0     4  0  2  2  3  1  3  1
    1     0  6  3  3  3  3  1  5
B   0     2  3  5  0  3  2  2  3
    1     2  3  0  5  3  2  2  3
C   0     3  3  3  3  6  0  3  3
    1     1  3  2  2  0  4  1  3
D   0     3  1  2  2  3  1  4  0
    1     1  5  3  3  3  3  0  6

huangapple
  • 本文由 发表于 2023年5月13日 13:52:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76241290.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定