英文:
Pandas Groupby result into a separate dataframe
问题
根据这个数据框,我想创建一个新的数据框,该数据框是对这个数据框进行group_by
操作和特定列(target)的value_counts
操作的结果。
我已经找到了如何获取这些值的方法(我的当前代码):
for id, group in df.groupby('id'):
print(id)
print(group['target'].value_counts())
这会给我以下输出:
00
0 3
Name: target, dtype: int64
01
0 1
1 3
Name: target, dtype: int64
02
0 2
1 2
Name: target, dtype: int64
我能够获取这些值,但似乎无法将这些值传递到一个空数据框中。我想创建一个新的数据框,以以下格式表示这些信息:
id 0 1
0 00 3 NaN
1 01 1 3
2 02 2 2
要做到这一点,您可以使用pivot
方法将数据重塑成所需的格式:
result_df = df.groupby(['id', 'target']).size().unstack(fill_value=0).reset_index()
result_df.columns.name = None
这将创建一个新的数据框result_df
,其中id是索引列,0和1是目标列的值,NaN用0填充。
英文:
Say there is a dataframe with 100 records containing 4(or n) columns, example of dataframe below:
id target col3 col4
00 0 .. ..
00 0 .. ..
00 0 .. ..
01 1 .. ..
01 1 .. ..
01 0 .. ..
01 1 .. ..
02 1 .. ..
02 0 .. ..
02 1 .. ..
02 0 .. ..
..
..
Based on this dataframe I want to create a new dataframe that is a resultant of group_by
on this dataframe and value_counts
of a specific column (target).
I have figured out how to get those values(my current code):
for id, target in df.group_by('id'):
print(id)
print(group.target.value_counts())
Which give me the following output:
00
0 3
Name: target, dtype: int64
01
0 1
1 3
Name: target, dtype: int64
02
0 2
1 2
Name: target, dtype: int64
..
..
I am able to get these values but I can't seem to pass these values into a empty dataframe. I would like to create a new dataframe that represents this information in this format:
id 0 1
00 3 NaN
01 1 3
02 2 2
..
..
答案1
得分: 2
这是一种方法:
df = (df
.groupby('id')
.apply(lambda f: f['target'].value_counts().to_frame())
.unstack()
.reset_index())
df.columns = ['id', 0, 1]
print(df)
id 0 1
0 0 3.0 NaN
1 1 1.0 3.0
2 2 2.0 2.0
英文:
Here's a way to do:
df = (df
.groupby('id')
.apply(lambda f: f['target'].value_counts().to_frame())
.unstack()
.reset_index())
df.columns = ['id', 0, 1]
print(df)
id 0 1
0 0 3.0 NaN
1 1 1.0 3.0
2 2 2.0 2.0
答案2
得分: 2
你可以使用.pivot_table()
并将'size'
作为aggfunc来创建简单的透视表:
d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
df = pd.DataFrame(d)
print(df.pivot_table(columns='target', index='id', aggfunc='size'))
输出结果:
target 0 1
id
00 3.0 NaN
01 1.0 3.0
02 2.0 2.0
英文:
You can do simple .pivot_table()
with 'size'
as aggfunc:
d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
df = pd.DataFrame(d)
print( df.pivot_table(columns='target', index='id', aggfunc='size') )
Prints:
target 0 1
id
00 3.0 NaN
01 1.0 3.0
02 2.0 2.0
答案3
得分: 1
你可以使用Pandas的CrossTab功能来实现这个目标。Pandas Crosstab可以计算表格中两个因素之间的值的频率。在这里阅读更多信息。
import pandas as pd
import numpy as np
d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
df = pd.DataFrame(d)
print(pd.crosstab(index=df['id'], columns=df['target']).replace(0, np.nan))
打印结果为:
target 0 1
id
00 3 0
01 1 3
02 2 2
英文:
You can use Pandas CrossTab functionality to achieve this. Pandas Crosstab computes the frequency of values between two factors in a table. Read more here
import pandas as pd
import numpy as np
d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
df = pd.DataFrame(d)
print ( pd.crosstab(index=df['id'], columns=df['target']).replace(0, np.nan) )
prints
target 0 1
id
00 3 0
01 1 3
02 2 2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论