如何在pandas中合并交叉表的类别,其中一些类别是共同的?

huangapple go评论85阅读模式
英文:

How do I merge categories for crosstab in pandas where some categories are common?

问题

以下是要翻译的内容:

不久前,我提出了这个问题1

但这并不包括两个合并类别可能具有共同类别的情况。

在这种情况下,我想要将类别 A 和 B 合并成 AB。如果我有类别 A、B、C,我想要将 A、B 合并成 AB,将 B、C 合并成 BC,会怎样?

假设我有以下数据:

+---+---+
| X | Y |
+---+---+
| A | D |
| B | D |
| B | E |
| B | D |
| A | E |
| C | D |
| C | E |
| B | E |
+---+---+

我希望交叉表看起来像这样:

+--------+---+---+
|  X/Y   | D | E |
+--------+---+---+
| A 或 B | 3 | 3 |
| B 或 C | 3 | 2 |
| C      | 1 | 1 |
+--------+---+---+
英文:

A while ago I asked this question

But that does not cover the case where two merged categories might have a common category

In that case I wanted to merge the categories A and B into AB. What if I have categories A, B, C and I want to merge A,B into AB, and B,C into BC?

Suppose I have the data:

+---+---+
| X | Y |
+---+---+
| A | D |
| B | D |
| B | E |
| B | D |
| A | E |
| C | D |
| C | E |
| B | E |
+---+---+

I want the cross-tab to look like:

+--------+---+---+
|  X/Y   | D | E |
+--------+---+---+
| A or B | 3 | 3 |
| B or C | 3 | 2 |
| C      | 1 | 1 |
+--------+---+---+

答案1

得分: 1

我认为你可以使用crosstab根据所有唯一值进行操作,然后通过选择索引值中的类别来对值进行求和:

df = pd.crosstab(df.X, df.Y)
df.loc['A or B'] = df.loc[['A','B']].sum()
df.loc['B or C'] = df.loc[['C','B']].sum()
df = df.drop(['A','B'])
print (df)
Y       D  E
X           
C       1  1
A or B  3  3
B or C  3  3

编辑:如果需要通用解决方案,这不容易,因为需要像这样使用rename来重复组:

df1 = df[df['X'] == 'B'].assign(X = 'B or C')
df2 = df[df['X'] == 'C']
df = pd.concat([df, df1], ignore_index=True)
df['X'] = df['X'].replace({'A':'A or B', 'B': 'A or B', 'C': 'B or C'})
df = pd.concat([df, df2], ignore_index=True)

df = pd.crosstab(df.X, df.Y)
print (df)
Y       D  E
X           
A or B  3  3
B or C  3  3
C       1  1
英文:

I think you can use crosstab by all unique values and then sum values by selecting by categories in index values:

df = pd.crosstab(df.X, df.Y)
df.loc['A or B'] = df.loc[['A','B']].sum()
df.loc['B or C'] = df.loc[['C','B']].sum()
df = df.drop(['A','B'])
print (df)
Y       D  E
X           
C       1  1
A or B  3  3
B or C  3  3

EDIT: If want general solution it is not easy, because is necessary repeat groups with rename like:

df1 = df[df['X'] == 'B'].assign(X = 'B or C')
df2 = df[df['X'] == 'C']
df = pd.concat([df, df1], ignore_index=True)
df['X'] = df['X'].replace({'A':'A or B', 'B': 'A or B', 'C': 'B or C'})
df = pd.concat([df, df2], ignore_index=True)

df = pd.crosstab(df.X, df.Y)
print (df)
Y       D  E
X           
A or B  3  3
B or C  3  3
C       1  1

huangapple
  • 本文由 发表于 2020年1月6日 20:32:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/59612201.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定