英文:
Group by and aggregate the columns in pandas dataframe
问题
我有以下的数据框,我想按照某一列进行分组,并以分隔符“ | ”聚合该行中其他列的唯一值。以下是示例行:
col1 col2 col3 col4
THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 1 {3-M syndrome 1} 273750
THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 2 {3-M syndrome 2} 273750
我想按照'col1'列进行分组,并聚合其他唯一值。预期的数据框如下:
col1 col2 col3 col4
THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 1 | 3-m syndrome 2 {3-M syndrome 1} | {3-M syndrome 2} 273750
我正在使用以下代码行:
join_unique = lambda x: ' | '.join(x.unique())
df2= df.groupby(['col1'], as_index=False).agg(join_unique)
我得到了输出,但是'col4'没有包含在输出中。
希望能有所帮助。
英文:
I have following dataframe which I would like to group by a certain column and aggregate the uniques values in other column of respective row by a separator like ' | '. Below is the sample rows:
col1 col2 col3 col4
THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 1 {3-M syndrome 1} 273750
THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 2 {3-M syndrome 2} 273750
I would like to group by 'col1' and aggregate the other unique values. The expected df is:
col1 col2 col3 col4
THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 1 | 3-m syndrome 2 {3-M syndrome 1} | {3-M syndrome 2} 273750
I am using following lines of code.
join_unique = lambda x: ' | '.join(x.unique())
df2= df.groupby(['preferred_title_symbol'], as_index=False).agg(join_unique)
I get output but col4 is not included in the output.
col1 col2 col3
THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 1 | 3-m syndrome 2 {3-M syndrome 1} | {3-M syndrome 2}
Any help is highly appreciated.
答案1
得分: 0
可能是因为col4包含整数,所以连接不起作用。您可以尝试像这样使用if/else:
data = {'col1': {0: '三M综合症1 {三M综合症1,273750 (3)}',
1: '三M综合症1 {三M综合症1,273750 (3)}'},
'col2': {0: '3-M综合症1', 1: '3-M综合症2'},
'col3': {0: '{三M综合症1}', 1: '{三M综合症2}'},
'col4': {0: 273750, 1: 273750}}
df = pd.DataFrame(data)
>> df.groupby('col1').agg(lambda x: ' | '.join(x.unique()) if x.nunique()>1 else x.unique()[0] )
Out:
col2 col3 col4
col1
三M综合症1 {三M综合症1,273750 (3)} 3-M综合症1 | 3-M综合症2 {三M综合症1} | {三M综合症2} 273750
英文:
It could be because col4 contains integers, therefore the join doesn't work. You could try with an if/else like this:
data = {'col1': {0: 'THREE M SYNDROME 1 {3-M syndrome 1, 273750 3)}',
1: 'THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)}'},
'col2': {0: '3-m syndrome 1', 1: '3-m syndrome 2'},
'col3': {0: '{3-M syndrome 1}', 1: '{3-M syndrome 2}'},
'col4': {0: 273750, 1: 273750}}
df = pd.DataFrame(data)
>>> df.groupby('col1').agg(lambda x: ' | '.join(x.unique()) if x.nunique()>1 else x.unique()[0] )
Out:
col2 col3 col4
col1
THREE M SYNDROME 1 {3-M syndrome 1, 273750 (3)} 3-m syndrome 1 | 3-m syndrome 2 {3-M syndrome 1} | {3-M syndrome 2} 273750
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论