按照 pandas 数据框中的列进行分组和聚合。

huangapple go评论63阅读模式
英文:

Group by and aggregate the columns in pandas dataframe

问题

我有以下的数据框,我想按照某一列进行分组,并以分隔符“ | ”聚合该行中其他列的唯一值。以下是示例行:

col1                                                col2            col3                col4
THREE M SYNDROME 1	{3-M syndrome 1, 273750 (3)}	3-m syndrome 1	{3-M syndrome 1}	273750
THREE M SYNDROME 1	{3-M syndrome 1, 273750 (3)}	3-m syndrome 2	{3-M syndrome 2}	273750

我想按照'col1'列进行分组,并聚合其他唯一值。预期的数据框如下:

col1                                                col2            col3                       col4
THREE M SYNDROME 1	{3-M syndrome 1, 273750 (3)}	3-m syndrome 1 | 3-m syndrome 2	{3-M syndrome 1} | {3-M syndrome 2}	273750

我正在使用以下代码行:

join_unique = lambda x: ' | '.join(x.unique())
df2= df.groupby(['col1'], as_index=False).agg(join_unique)

我得到了输出,但是'col4'没有包含在输出中。

希望能有所帮助。

英文:

I have following dataframe which I would like to group by a certain column and aggregate the uniques values in other column of respective row by a separator like ' | '. Below is the sample rows:

col1                                                col2            col3                col4
THREE M SYNDROME 1	{3-M syndrome 1, 273750 (3)}	3-m syndrome 1	{3-M syndrome 1}	273750
THREE M SYNDROME 1	{3-M syndrome 1, 273750 (3)}	3-m syndrome 2	{3-M syndrome 2}	273750

I would like to group by 'col1' and aggregate the other unique values. The expected df is:

col1                                                col2            col3                       col4
THREE M SYNDROME 1	{3-M syndrome 1, 273750 (3)}	3-m syndrome 1 | 3-m syndrome 2	{3-M syndrome 1} | {3-M syndrome 2}	273750

I am using following lines of code.

join_unique = lambda x: ' | '.join(x.unique())
df2= df.groupby(['preferred_title_symbol'], as_index=False).agg(join_unique)

I get output but col4 is not included in the output.

col1                                                col2            col3                       
THREE M SYNDROME 1	{3-M syndrome 1, 273750 (3)}	3-m syndrome 1 | 3-m syndrome 2	{3-M syndrome 1} | {3-M syndrome 2}

Any help is highly appreciated.

答案1

得分: 0

可能是因为col4包含整数,所以连接不起作用。您可以尝试像这样使用if/else:

data = {'col1': {0: '三M综合症1 {三M综合症1,273750 (3)}',
  1: '三M综合症1 {三M综合症1,273750 (3)}'},
 'col2': {0: '3-M综合症1', 1: '3-M综合症2'},
 'col3': {0: '{三M综合症1}', 1: '{三M综合症2}'},
 'col4': {0: 273750, 1: 273750}}

df = pd.DataFrame(data)

>> df.groupby('col1').agg(lambda x: ' | '.join(x.unique()) if x.nunique()>1 else x.unique()[0] )

Out:

                                             col2        col3   col4
col1
三M综合症1 {三M综合症1273750 (3)}  3-M综合症1 | 3-M综合症2  {三M综合症1} | {三M综合症2}  273750
英文:

It could be because col4 contains integers, therefore the join doesn't work. You could try with an if/else like this:

data = {'col1': {0: 'THREE M SYNDROME 1  {3-M syndrome 1, 273750 3)}',
  1: 'THREE M SYNDROME 1  {3-M syndrome 1, 273750 (3)}'},
 'col2': {0: '3-m syndrome 1', 1: '3-m syndrome 2'},
 'col3': {0: '{3-M syndrome 1}', 1: '{3-M syndrome 2}'},
 'col4': {0: 273750, 1: 273750}}

df = pd.DataFrame(data)

>>> df.groupby('col1').agg(lambda x: ' | '.join(x.unique()) if x.nunique()>1 else x.unique()[0]   )

Out:

                                                col2	col3	col4
col1			
THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)}	3-m syndrome 1 | 3-m syndrome 2	{3-M syndrome 1} | {3-M syndrome 2}	273750

huangapple
  • 本文由 发表于 2023年7月12日 20:40:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/76670659.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定