如何在pandas中合并交叉表的类别,其中一些类别是共同的?

huangapple go评论106阅读模式
英文:

How do I merge categories for crosstab in pandas where some categories are common?

问题

以下是要翻译的内容:

不久前,我提出了这个问题1

但这并不包括两个合并类别可能具有共同类别的情况。

在这种情况下,我想要将类别 A 和 B 合并成 AB。如果我有类别 A、B、C,我想要将 A、B 合并成 AB,将 B、C 合并成 BC,会怎样?

假设我有以下数据:

  1. +---+---+
  2. | X | Y |
  3. +---+---+
  4. | A | D |
  5. | B | D |
  6. | B | E |
  7. | B | D |
  8. | A | E |
  9. | C | D |
  10. | C | E |
  11. | B | E |
  12. +---+---+

我希望交叉表看起来像这样:

  1. +--------+---+---+
  2. | X/Y | D | E |
  3. +--------+---+---+
  4. | A B | 3 | 3 |
  5. | B C | 3 | 2 |
  6. | C | 1 | 1 |
  7. +--------+---+---+
英文:

A while ago I asked this question

But that does not cover the case where two merged categories might have a common category

In that case I wanted to merge the categories A and B into AB. What if I have categories A, B, C and I want to merge A,B into AB, and B,C into BC?

Suppose I have the data:

  1. +---+---+
  2. | X | Y |
  3. +---+---+
  4. | A | D |
  5. | B | D |
  6. | B | E |
  7. | B | D |
  8. | A | E |
  9. | C | D |
  10. | C | E |
  11. | B | E |
  12. +---+---+

I want the cross-tab to look like:

  1. +--------+---+---+
  2. | X/Y | D | E |
  3. +--------+---+---+
  4. | A or B | 3 | 3 |
  5. | B or C | 3 | 2 |
  6. | C | 1 | 1 |
  7. +--------+---+---+

答案1

得分: 1

我认为你可以使用crosstab根据所有唯一值进行操作,然后通过选择索引值中的类别来对值进行求和:

  1. df = pd.crosstab(df.X, df.Y)
  2. df.loc['A or B'] = df.loc[['A','B']].sum()
  3. df.loc['B or C'] = df.loc[['C','B']].sum()
  4. df = df.drop(['A','B'])
  5. print (df)
  6. Y D E
  7. X
  8. C 1 1
  9. A or B 3 3
  10. B or C 3 3

编辑:如果需要通用解决方案,这不容易,因为需要像这样使用rename来重复组:

  1. df1 = df[df['X'] == 'B'].assign(X = 'B or C')
  2. df2 = df[df['X'] == 'C']
  3. df = pd.concat([df, df1], ignore_index=True)
  4. df['X'] = df['X'].replace({'A':'A or B', 'B': 'A or B', 'C': 'B or C'})
  5. df = pd.concat([df, df2], ignore_index=True)
  6. df = pd.crosstab(df.X, df.Y)
  7. print (df)
  8. Y D E
  9. X
  10. A or B 3 3
  11. B or C 3 3
  12. C 1 1
英文:

I think you can use crosstab by all unique values and then sum values by selecting by categories in index values:

  1. df = pd.crosstab(df.X, df.Y)
  2. df.loc['A or B'] = df.loc[['A','B']].sum()
  3. df.loc['B or C'] = df.loc[['C','B']].sum()
  4. df = df.drop(['A','B'])
  5. print (df)
  6. Y D E
  7. X
  8. C 1 1
  9. A or B 3 3
  10. B or C 3 3

EDIT: If want general solution it is not easy, because is necessary repeat groups with rename like:

  1. df1 = df[df['X'] == 'B'].assign(X = 'B or C')
  2. df2 = df[df['X'] == 'C']
  3. df = pd.concat([df, df1], ignore_index=True)
  4. df['X'] = df['X'].replace({'A':'A or B', 'B': 'A or B', 'C': 'B or C'})
  5. df = pd.concat([df, df2], ignore_index=True)
  6. df = pd.crosstab(df.X, df.Y)
  7. print (df)
  8. Y D E
  9. X
  10. A or B 3 3
  11. B or C 3 3
  12. C 1 1

huangapple
  • 本文由 发表于 2020年1月6日 20:32:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/59612201.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定