将Pandas的Groupby结果存入一个单独的数据框中

huangapple go评论106阅读模式
英文:

Pandas Groupby result into a separate dataframe

问题

根据这个数据框,我想创建一个新的数据框,该数据框是对这个数据框进行group_by操作和特定列(target)的value_counts操作的结果。

我已经找到了如何获取这些值的方法(我的当前代码):

  1. for id, group in df.groupby('id'):
  2. print(id)
  3. print(group['target'].value_counts())

这会给我以下输出:

  1. 00
  2. 0 3
  3. Name: target, dtype: int64
  4. 01
  5. 0 1
  6. 1 3
  7. Name: target, dtype: int64
  8. 02
  9. 0 2
  10. 1 2
  11. Name: target, dtype: int64

我能够获取这些值,但似乎无法将这些值传递到一个空数据框中。我想创建一个新的数据框,以以下格式表示这些信息:

  1. id 0 1
  2. 0 00 3 NaN
  3. 1 01 1 3
  4. 2 02 2 2

要做到这一点,您可以使用pivot方法将数据重塑成所需的格式:

  1. result_df = df.groupby(['id', 'target']).size().unstack(fill_value=0).reset_index()
  2. result_df.columns.name = None

这将创建一个新的数据框result_df,其中id是索引列,0和1是目标列的值,NaN用0填充。

英文:

Say there is a dataframe with 100 records containing 4(or n) columns, example of dataframe below:

  1. id target col3 col4
  2. 00 0 .. ..
  3. 00 0 .. ..
  4. 00 0 .. ..
  5. 01 1 .. ..
  6. 01 1 .. ..
  7. 01 0 .. ..
  8. 01 1 .. ..
  9. 02 1 .. ..
  10. 02 0 .. ..
  11. 02 1 .. ..
  12. 02 0 .. ..
  13. ..
  14. ..

Based on this dataframe I want to create a new dataframe that is a resultant of group_by on this dataframe and value_counts of a specific column (target).

I have figured out how to get those values(my current code):

  1. for id, target in df.group_by('id'):
  2. print(id)
  3. print(group.target.value_counts())

Which give me the following output:

  1. 00
  2. 0 3
  3. Name: target, dtype: int64
  4. 01
  5. 0 1
  6. 1 3
  7. Name: target, dtype: int64
  8. 02
  9. 0 2
  10. 1 2
  11. Name: target, dtype: int64
  12. ..
  13. ..

I am able to get these values but I can't seem to pass these values into a empty dataframe. I would like to create a new dataframe that represents this information in this format:

  1. id 0 1
  2. 00 3 NaN
  3. 01 1 3
  4. 02 2 2
  5. ..
  6. ..

答案1

得分: 2

这是一种方法:

  1. df = (df
  2. .groupby('id')
  3. .apply(lambda f: f['target'].value_counts().to_frame())
  4. .unstack()
  5. .reset_index())
  6. df.columns = ['id', 0, 1]
  7. print(df)
  1. id 0 1
  2. 0 0 3.0 NaN
  3. 1 1 1.0 3.0
  4. 2 2 2.0 2.0
英文:

Here's a way to do:

  1. df = (df
  2. .groupby('id')
  3. .apply(lambda f: f['target'].value_counts().to_frame())
  4. .unstack()
  5. .reset_index())
  6. df.columns = ['id', 0, 1]
  7. print(df)
  8. id 0 1
  9. 0 0 3.0 NaN
  10. 1 1 1.0 3.0
  11. 2 2 2.0 2.0

答案2

得分: 2

你可以使用.pivot_table()并将'size'作为aggfunc来创建简单的透视表:

  1. d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
  2. 'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
  3. df = pd.DataFrame(d)
  4. print(df.pivot_table(columns='target', index='id', aggfunc='size'))

输出结果:

  1. target 0 1
  2. id
  3. 00 3.0 NaN
  4. 01 1.0 3.0
  5. 02 2.0 2.0
英文:

You can do simple .pivot_table() with 'size' as aggfunc:

  1. d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
  2. 'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
  3. df = pd.DataFrame(d)
  4. print( df.pivot_table(columns='target', index='id', aggfunc='size') )

Prints:

  1. target 0 1
  2. id
  3. 00 3.0 NaN
  4. 01 1.0 3.0
  5. 02 2.0 2.0

答案3

得分: 1

你可以使用Pandas的CrossTab功能来实现这个目标。Pandas Crosstab可以计算表格中两个因素之间的值的频率。在这里阅读更多信息。

  1. import pandas as pd
  2. import numpy as np
  3. d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
  4. 'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
  5. df = pd.DataFrame(d)
  6. print(pd.crosstab(index=df['id'], columns=df['target']).replace(0, np.nan))

打印结果为:

  1. target 0 1
  2. id
  3. 00 3 0
  4. 01 1 3
  5. 02 2 2
英文:

You can use Pandas CrossTab functionality to achieve this. Pandas Crosstab computes the frequency of values between two factors in a table. Read more here

  1. import pandas as pd
  2. import numpy as np
  3. d = {'id': ['00', '00', '00', '01', '01', '01', '01', '02', '02', '02', '02'],
  4. 'target': [0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0]}
  5. df = pd.DataFrame(d)
  6. print ( pd.crosstab(index=df['id'], columns=df['target']).replace(0, np.nan) )

prints

  1. target 0 1
  2. id
  3. 00 3 0
  4. 01 1 3
  5. 02 2 2

huangapple
  • 本文由 发表于 2020年1月4日 01:43:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/59583022.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定