将Pandas的Groupby结果存入一个单独的数据框中

huangapple go评论75阅读模式
英文:

Pandas Groupby result into a separate dataframe

问题

根据这个数据框,我想创建一个新的数据框,该数据框是对这个数据框进行group_by操作和特定列(target)的value_counts操作的结果。

我已经找到了如何获取这些值的方法(我的当前代码):

for id, group in df.groupby('id'):
    print(id)
    print(group['target'].value_counts())

这会给我以下输出:

00
0    3
Name: target, dtype: int64
01
0    1
1    3
Name: target, dtype: int64
02
0    2
1    2
Name: target, dtype: int64

我能够获取这些值,但似乎无法将这些值传递到一个空数据框中。我想创建一个新的数据框,以以下格式表示这些信息:

  id   0   1
0 00   3 NaN
1 01   1   3
2 02   2   2

要做到这一点,您可以使用pivot方法将数据重塑成所需的格式:

result_df = df.groupby(['id', 'target']).size().unstack(fill_value=0).reset_index()
result_df.columns.name = None

这将创建一个新的数据框result_df,其中id是索引列,0和1是目标列的值,NaN用0填充。

英文:

Say there is a dataframe with 100 records containing 4(or n) columns, example of dataframe below:

 id  target   col3   col4
 00     0      ..     .. 
 00     0      ..     ..
 00     0      ..     ..
 01     1      ..     ..
 01     1      ..     ..
 01     0      ..     ..
 01     1      ..     ..
 02     1      ..     ..
 02     0      ..     ..
 02     1      ..     ..
 02     0      ..     ..
 ..
 ..

Based on this dataframe I want to create a new dataframe that is a resultant of group_by on this dataframe and value_counts of a specific column (target).

I have figured out how to get those values(my current code):

for id, target in df.group_by('id'):
    print(id)
    print(group.target.value_counts())

Which give me the following output:

00
0    3
Name: target, dtype: int64
01
0    1
1    3
Name: target, dtype: int64
02
0    2
1    2
Name: target, dtype: int64
..
..

I am able to get these values but I can't seem to pass these values into a empty dataframe. I would like to create a new dataframe that represents this information in this format:

id   0   1
00   3  NaN
01   1   3
02   2   2
..
..

答案1

得分: 2

这是一种方法:

df = (df
     .groupby('id')
     .apply(lambda f: f['target'].value_counts().to_frame())
     .unstack()
     .reset_index())

df.columns = ['id', 0, 1]
print(df)
   id    0    1
0   0  3.0  NaN
1   1  1.0  3.0
2   2  2.0  2.0
英文:

Here's a way to do:

df = (df
     .groupby('id')
     .apply(lambda f: f['target'].value_counts().to_frame())
     .unstack()
     .reset_index())

df.columns = ['id', 0, 1]
print(df)

   id    0    1
0   0  3.0  NaN
1   1  1.0  3.0
2   2  2.0  2.0

答案2

得分: 2

你可以使用.pivot_table()并将'size'作为aggfunc来创建简单的透视表:

d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
     'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
df = pd.DataFrame(d)

print(df.pivot_table(columns='target', index='id', aggfunc='size'))

输出结果:

target    0    1
id              
00      3.0  NaN
01      1.0  3.0
02      2.0  2.0
英文:

You can do simple .pivot_table() with 'size' as aggfunc:

d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
     'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
df = pd.DataFrame(d)

print( df.pivot_table(columns='target', index='id', aggfunc='size') )

Prints:

target    0    1
id              
00      3.0  NaN
01      1.0  3.0
02      2.0  2.0

答案3

得分: 1

你可以使用Pandas的CrossTab功能来实现这个目标。Pandas Crosstab可以计算表格中两个因素之间的值的频率。在这里阅读更多信息。

import pandas as pd
import numpy as np

d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
     'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
df = pd.DataFrame(d)

print(pd.crosstab(index=df['id'], columns=df['target']).replace(0, np.nan))

打印结果为:

target	0	1
id	
00	3	0
01	1	3
02	2	2
英文:

You can use Pandas CrossTab functionality to achieve this. Pandas Crosstab computes the frequency of values between two factors in a table. Read more here

import pandas as pd
import numpy as np

d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
     'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
df = pd.DataFrame(d)

print ( pd.crosstab(index=df['id'], columns=df['target']).replace(0, np.nan) )

prints

target	0	1
id	
00		3	0
01		1	3
02		2	2

huangapple
  • 本文由 发表于 2020年1月4日 01:43:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/59583022.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定