2020年1月4日 01:43:02go评论112阅读模式

英文:

Pandas Groupby result into a separate dataframe

问题

根据这个数据框，我想创建一个新的数据框，该数据框是对这个数据框进行group_by操作和特定列（target）的value_counts操作的结果。

我已经找到了如何获取这些值的方法（我的当前代码）：

for id, group in df.groupby('id'):
    print(id)
    print(group['target'].value_counts())

这会给我以下输出：

00
0    3
Name: target, dtype: int64
01
0    1
1    3
Name: target, dtype: int64
02
0    2
1    2
Name: target, dtype: int64

我能够获取这些值，但似乎无法将这些值传递到一个空数据框中。我想创建一个新的数据框，以以下格式表示这些信息：

  id   0   1
0 00   3 NaN
1 01   1   3
2 02   2   2

要做到这一点，您可以使用pivot方法将数据重塑成所需的格式：

result_df = df.groupby(['id', 'target']).size().unstack(fill_value=0).reset_index()
result_df.columns.name = None

这将创建一个新的数据框result_df，其中id是索引列，0和1是目标列的值，NaN用0填充。

英文:

Say there is a dataframe with 100 records containing 4(or n) columns, example of dataframe below:

 id  target   col3   col4
 00     0      ..     .. 
 00     0      ..     ..
 00     0      ..     ..
 01     1      ..     ..
 01     1      ..     ..
 01     0      ..     ..
 01     1      ..     ..
 02     1      ..     ..
 02     0      ..     ..
 02     1      ..     ..
 02     0      ..     ..
 ..
 ..

Based on this dataframe I want to create a new dataframe that is a resultant of group_by on this dataframe and value_counts of a specific column (target).

I have figured out how to get those values(my current code):

for id, target in df.group_by(&#39;id&#39;):
    print(id)
    print(group.target.value_counts())

Which give me the following output:

00
0    3
Name: target, dtype: int64
01
0    1
1    3
Name: target, dtype: int64
02
0    2
1    2
Name: target, dtype: int64
..
..

I am able to get these values but I can't seem to pass these values into a empty dataframe. I would like to create a new dataframe that represents this information in this format:

id   0   1
00   3  NaN
01   1   3
02   2   2
..
..

答案1

得分: 2

这是一种方法：

df = (df
     .groupby('id')
     .apply(lambda f: f['target'].value_counts().to_frame())
     .unstack()
     .reset_index())
df.columns = ['id', 0, 1]
print(df)

   id    0    1
0   0  3.0  NaN
1   1  1.0  3.0
2   2  2.0  2.0

英文:

Here's a way to do:

df = (df
     .groupby(&#39;id&#39;)
     .apply(lambda f: f[&#39;target&#39;].value_counts().to_frame())
     .unstack()
     .reset_index())
df.columns = [&#39;id&#39;, 0, 1]
print(df)
   id    0    1
0   0  3.0  NaN
1   1  1.0  3.0
2   2  2.0  2.0

答案2

得分: 2

你可以使用.pivot_table()并将'size'作为aggfunc来创建简单的透视表：

d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
     'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
df = pd.DataFrame(d)
print(df.pivot_table(columns='target', index='id', aggfunc='size'))

输出结果：

target    0    1
id              
00      3.0  NaN
01      1.0  3.0
02      2.0  2.0

英文:

You can do simple .pivot_table() with 'size' as aggfunc:

d = {&#39;id&#39;: [&#39;00&#39;, &#39;00&#39;, &#39;00&#39;, &#39;01&#39;, &#39;01&#39;, &#39;01&#39;, &#39;01&#39;, &#39;02&#39;, &#39;02&#39;, &#39;02&#39;, &#39;02&#39;],
     &#39;target&#39;: [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
df = pd.DataFrame(d)
print( df.pivot_table(columns=&#39;target&#39;, index=&#39;id&#39;, aggfunc=&#39;size&#39;) )

Prints:

target    0    1
id              
00      3.0  NaN
01      1.0  3.0
02      2.0  2.0

答案3

得分: 1

你可以使用Pandas的CrossTab功能来实现这个目标。Pandas Crosstab可以计算表格中两个因素之间的值的频率。在这里阅读更多信息。

import pandas as pd
import numpy as np
d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
     'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
df = pd.DataFrame(d)
print(pd.crosstab(index=df['id'], columns=df['target']).replace(0, np.nan))

打印结果为：

target	0	1
id	
00	3	0
01	1	3
02	2	2

英文:

You can use Pandas CrossTab functionality to achieve this. Pandas Crosstab computes the frequency of values between two factors in a table. Read more here

import pandas as pd
import numpy as np
d = {&#39;id&#39;: [&#39;00&#39;, &#39;00&#39;, &#39;00&#39;, &#39;01&#39;, &#39;01&#39;, &#39;01&#39;, &#39;01&#39;, &#39;02&#39;, &#39;02&#39;, &#39;02&#39;, &#39;02&#39;],
     &#39;target&#39;: [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
df = pd.DataFrame(d)
print ( pd.crosstab(index=df[&#39;id&#39;], columns=df[&#39;target&#39;]).replace(0, np.nan) )

prints

target	0	1
id	
00		3	0
01		1	3
02		2	2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将Pandas的Groupby结果存入一个单独的数据框中

问题

答案1

答案2

答案3

尝试在Python中按部门显示计数。

使用Python删除与列表相比的字典键。

将图像二值化并提取文本，其中背景为黑色，要提取的文本为红色。

1D卷积神经网络与2D数组

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。