2020年1月6日 20:32:44go评论106阅读模式

英文:

How do I merge categories for crosstab in pandas where some categories are common?

问题

以下是要翻译的内容：

不久前，我提出了这个问题1。

但这并不包括两个合并类别可能具有共同类别的情况。

在这种情况下，我想要将类别 A 和 B 合并成 AB。如果我有类别 A、B、C，我想要将 A、B 合并成 AB，将 B、C 合并成 BC，会怎样？

假设我有以下数据：

+---+---+
| X | Y |
+---+---+
| A | D |
| B | D |
| B | E |
| B | D |
| A | E |
| C | D |
| C | E |
| B | E |
+---+---+

我希望交叉表看起来像这样：

+--------+---+---+
|  X/Y   | D | E |
+--------+---+---+
| A 或 B | 3 | 3 |
| B 或 C | 3 | 2 |
| C      | 1 | 1 |
+--------+---+---+

英文:

A while ago I asked this question

But that does not cover the case where two merged categories might have a common category

In that case I wanted to merge the categories A and B into AB. What if I have categories A, B, C and I want to merge A,B into AB, and B,C into BC?

Suppose I have the data:

+---+---+
| X | Y |
+---+---+
| A | D |
| B | D |
| B | E |
| B | D |
| A | E |
| C | D |
| C | E |
| B | E |
+---+---+

I want the cross-tab to look like:

+--------+---+---+
|  X/Y   | D | E |
+--------+---+---+
| A or B | 3 | 3 |
| B or C | 3 | 2 |
| C      | 1 | 1 |
+--------+---+---+

答案1

得分: 1

我认为你可以使用crosstab根据所有唯一值进行操作，然后通过选择索引值中的类别来对值进行求和：

df = pd.crosstab(df.X, df.Y)
df.loc['A or B'] = df.loc[['A','B']].sum()
df.loc['B or C'] = df.loc[['C','B']].sum()
df = df.drop(['A','B'])
print (df)
Y       D  E
X           
C       1  1
A or B  3  3
B or C  3  3

编辑：如果需要通用解决方案，这不容易，因为需要像这样使用rename来重复组：

df1 = df[df['X'] == 'B'].assign(X = 'B or C')
df2 = df[df['X'] == 'C']
df = pd.concat([df, df1], ignore_index=True)
df['X'] = df['X'].replace({'A':'A or B', 'B': 'A or B', 'C': 'B or C'})
df = pd.concat([df, df2], ignore_index=True)
df = pd.crosstab(df.X, df.Y)
print (df)
Y       D  E
X           
A or B  3  3
B or C  3  3
C       1  1

英文:

I think you can use crosstab by all unique values and then sum values by selecting by categories in index values:

df = pd.crosstab(df.X, df.Y)
df.loc[&#39;A or B&#39;] = df.loc[[&#39;A&#39;,&#39;B&#39;]].sum()
df.loc[&#39;B or C&#39;] = df.loc[[&#39;C&#39;,&#39;B&#39;]].sum()
df = df.drop([&#39;A&#39;,&#39;B&#39;])
print (df)
Y       D  E
X           
C       1  1
A or B  3  3
B or C  3  3

EDIT: If want general solution it is not easy, because is necessary repeat groups with rename like:

df1 = df[df[&#39;X&#39;] == &#39;B&#39;].assign(X = &#39;B or C&#39;)
df2 = df[df[&#39;X&#39;] == &#39;C&#39;]
df = pd.concat([df, df1], ignore_index=True)
df[&#39;X&#39;] = df[&#39;X&#39;].replace({&#39;A&#39;:&#39;A or B&#39;, &#39;B&#39;: &#39;A or B&#39;, &#39;C&#39;: &#39;B or C&#39;})
df = pd.concat([df, df2], ignore_index=True)
df = pd.crosstab(df.X, df.Y)
print (df)
Y       D  E
X           
A or B  3  3
B or C  3  3
C       1  1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在pandas中合并交叉表的类别，其中一些类别是共同的？

问题

答案1

如何在 pandas 中获取姓和名，当姓是多个名字时。

创建基于另一个数据集的作者的数据集。

删除基于有效数据百分比的 Pandas 行

“RuntimeError: CustomJob resource has not been created” 在创建 Vertex AI CustomJob 时发生

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。