2023年5月13日 13:52:07go评论61阅读模式

英文:

Crosstab across 4 columns and multi-index output

问题

这是你的数据 -

import pandas as pd
a = [[1,0,1,1], [1,1,0,0], [1,1,1,1], [1,1,1,1], [0,0,1,0], [0,1,0,0], [0,0,0,0], [1,0,0,1], [1,0,0,1], [0,1,0,1]]
df = pd.DataFrame(a, columns=['A','B','C','D'])

期望的输出是四列和两个值之间的交叉表 -

iterables = [["A", "B", "C", "D"], [1, 0]]
index = pd.MultiIndex.from_product(iterables)

op = [[0,0,3,3,3,3,0,0], [0,0,2,2,1,3,0,0], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,3,1], 
      [0,0,0,0,0,0,3,3], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0]]

print(pd.DataFrame(op, index=index, columns=index))

我已经帮你翻译了代码和相关描述部分。

英文:

Here is my data -

import pandas as pd
a = [[1,0,1,1], [1,1,0,0], [1,1,1,1], [1,1,1,1], [0,0,1,0], [0,1,0,0], [0,0,0,0], [1,0,0,1], [1,0,0,1], [0,1,0,1]]
df = pd.DataFrame(a, columns=[&#39;A&#39;,&#39;B&#39;,&#39;C&#39;,&#39;D&#39;])

   A  B  C  D
0  1  0  1  1
1  1  1  0  0
2  1  1  1  1
3  1  1  1  1
4  0  0  1  0
5  0  1  0  0
6  0  0  0  0
7  1  0  0  1
8  1  0  0  1
9  0  1  0  1

The desired output is a cross tab of counts between different combinations of the 4 columns and two values -

iterables = [[&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;], [1, 0]]
index = pd.MultiIndex.from_product(iterables)

op = [[0,0,3,3,3,3,0,0], [0,0,2,2,1,3,0,0], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,3,1], 
      [0,0,0,0,0,0,3,3], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0], [0,0,0,0,0,0,0,0]]

print(pd.DataFrame(op, index=index, columns=index))

     A     B     C     D   
     1  0  1  0  1  0  1  0
A 1  0  0  3  3  3  3  0  0
  0  0  0  2  2  1  3  0  0
B 1  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  3  1
C 1  0  0  0  0  0  0  3  3
  0  0  0  0  0  0  0  0  0
D 1  0  0  0  0  0  0  0  0
  0  0  0  0  0  0  0  0  0

I have tried pd.crosstab() and only seems to support two columns, not sure. Also tried pivot tables without luck. Please help.

答案1

得分: 1

你可以计算每个索引中每个值的出现次数，然后使用矩阵乘法：

tmp = (df.rename_axis(columns='c', index='i')
         .stack().to_frame(name='v')
         .reset_index()
         .groupby(['c','v'])['i'].value_counts()
         .unstack('i', fill_value=0)
      )

tmp @ tmp.T

输出：

c    A     B     C     D   
v    0  1  0  1  0  1  0  1
c v                        
A 0  4  0  2  2  3  1  3  1
  1  0  6  3  3  3  3  1  5
B 0  2  3  5  0  3  2  2  3
  1  2  3  0  5  3  2  2  3
C 0  3  3  3  3  6  0  3  3
  1  1  3  2  2  0  4  1  3
D 0  3  1  2  2  3  1  4  0
  1  1  5  3  3  3  3  0  6

英文:

You can count the occurrences of each values in each index, then use matrix multiplication:

tmp = (df.rename_axis(columns=&#39;c&#39;, index=&#39;i&#39;)
         .stack().to_frame(name=&#39;v&#39;)
         .reset_index()
         .groupby([&#39;c&#39;,&#39;v&#39;])[&#39;i&#39;].value_counts()
         .unstack(&#39;i&#39;, fill_value=0)
      )

tmp @ tmp.T

Output:

c    A     B     C     D   
v    0  1  0  1  0  1  0  1
c v                        
A 0  4  0  2  2  3  1  3  1
  1  0  6  3  3  3  3  1  5
B 0  2  3  5  0  3  2  2  3
  1  2  3  0  5  3  2  2  3
C 0  3  3  3  3  6  0  3  3
  1  1  3  2  2  0  4  1  3
D 0  3  1  2  2  3  1  4  0
  1  1  5  3  3  3  3  0  6

答案2

得分: 0

你可以先计算一个长的DataFrame，然后使用crosstab函数：

from itertools import product

tmp = pd.concat([pd.DataFrame({'col': a,
                               'idx': b,
                               'valc': df[a],
                               'vali': df[b]
                              })
                 for a, b in product(df, repeat=2)]
                )

out = pd.crosstab([tmp['idx'], tmp['vali']], [tmp['col'], tmp['valc']])

注：由于输出是对称的，你也可以计算组合而不是乘积。

英文:

You can first compute a long DataFrame, then the crosstab:


from itertools import product

tmp = pd.concat([pd.DataFrame({&#39;col&#39;: a,
                               &#39;idx&#39;: b,
                               &#39;valc&#39;: df[a],
                               &#39;vali&#39;: df[b]
                              })
                 for a, b in product(df, repeat=2)]
                )

out = pd.crosstab([tmp[&#39;idx&#39;], tmp[&#39;vali&#39;]], [tmp[&#39;col&#39;], tmp[&#39;valc&#39;]])

NB. as the output is symmetric, you can also compute the combinations instead of the product.

Output:

col       A     B     C     D   
valc      0  1  0  1  0  1  0  1
idx vali                        
A   0     4  0  2  2  3  1  3  1
    1     0  6  3  3  3  3  1  5
B   0     2  3  5  0  3  2  2  3
    1     2  3  0  5  3  2  2  3
C   0     3  3  3  3  6  0  3  3
    1     1  3  2  2  0  4  1  3
D   0     3  1  2  2  3  1  4  0
    1     1  5  3  3  3  3  0  6

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在四列之间制作交叉表，并生成多重索引输出。

问题

答案1

答案2

Python/Neo4j查询优化

使用 Pandas 对列中大于每个值的所有行进行分组。

Python正则表达式中可选子串之间的问题

如何使用SQLAlchemy Connection.execute()传递多个参数给INSERT INTO … VALUES？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论