按照 pandas 数据框中的列进行分组和聚合。

huangapple go评论105阅读模式
英文:

Group by and aggregate the columns in pandas dataframe

问题

我有以下的数据框,我想按照某一列进行分组,并以分隔符“ | ”聚合该行中其他列的唯一值。以下是示例行:

  1. col1 col2 col3 col4
  2. THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 1 {3-M syndrome 1} 273750
  3. THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 2 {3-M syndrome 2} 273750

我想按照'col1'列进行分组,并聚合其他唯一值。预期的数据框如下:

  1. col1 col2 col3 col4
  2. THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 1 | 3-m syndrome 2 {3-M syndrome 1} | {3-M syndrome 2} 273750

我正在使用以下代码行:

  1. join_unique = lambda x: ' | '.join(x.unique())
  2. df2= df.groupby(['col1'], as_index=False).agg(join_unique)

我得到了输出,但是'col4'没有包含在输出中。

希望能有所帮助。

英文:

I have following dataframe which I would like to group by a certain column and aggregate the uniques values in other column of respective row by a separator like ' | '. Below is the sample rows:

  1. col1 col2 col3 col4
  2. THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 1 {3-M syndrome 1} 273750
  3. THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 2 {3-M syndrome 2} 273750

I would like to group by 'col1' and aggregate the other unique values. The expected df is:

  1. col1 col2 col3 col4
  2. THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 1 | 3-m syndrome 2 {3-M syndrome 1} | {3-M syndrome 2} 273750

I am using following lines of code.

  1. join_unique = lambda x: ' | '.join(x.unique())
  2. df2= df.groupby(['preferred_title_symbol'], as_index=False).agg(join_unique)

I get output but col4 is not included in the output.

  1. col1 col2 col3
  2. THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 1 | 3-m syndrome 2 {3-M syndrome 1} | {3-M syndrome 2}

Any help is highly appreciated.

答案1

得分: 0

可能是因为col4包含整数,所以连接不起作用。您可以尝试像这样使用if/else:

  1. data = {'col1': {0: '三M综合症1 {三M综合症1,273750 (3)}',
  2. 1: '三M综合症1 {三M综合症1,273750 (3)}'},
  3. 'col2': {0: '3-M综合症1', 1: '3-M综合症2'},
  4. 'col3': {0: '{三M综合症1}', 1: '{三M综合症2}'},
  5. 'col4': {0: 273750, 1: 273750}}
  6. df = pd.DataFrame(data)
  7. >> df.groupby('col1').agg(lambda x: ' | '.join(x.unique()) if x.nunique()>1 else x.unique()[0] )
  8. Out:
  9. col2 col3 col4
  10. col1
  11. M综合症1 {三M综合症1273750 (3)} 3-M综合症1 | 3-M综合症2 {三M综合症1} | {三M综合症2} 273750
英文:

It could be because col4 contains integers, therefore the join doesn't work. You could try with an if/else like this:

  1. data = {'col1': {0: 'THREE M SYNDROME 1 {3-M syndrome 1, 273750 3)}',
  2. 1: 'THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)}'},
  3. 'col2': {0: '3-m syndrome 1', 1: '3-m syndrome 2'},
  4. 'col3': {0: '{3-M syndrome 1}', 1: '{3-M syndrome 2}'},
  5. 'col4': {0: 273750, 1: 273750}}
  6. df = pd.DataFrame(data)
  7. >>> df.groupby('col1').agg(lambda x: ' | '.join(x.unique()) if x.nunique()>1 else x.unique()[0] )
  8. Out:
  9. col2 col3 col4
  10. col1
  11. THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 1 | 3-m syndrome 2 {3-M syndrome 1} | {3-M syndrome 2} 273750

huangapple
  • 本文由 发表于 2023年7月12日 20:40:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76670659.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定