分组行并添加列(删除重复行)

huangapple go评论73阅读模式
英文:

Group rows and add columns (delete repeated lines)

问题

如何分组行并添加新列。

看例子:

import pandas as pd

df = pd.DataFrame({
    'name': ['Andy', 'Bob', 'Chad', 'Andy', 'Chad', 'Bob', 'George', 'Hank'],
    'col_1': ['A1', 'A2', 'A3', 'A4', 'B1', 'B2', 'B3', 'B4'],
    'col_2': [1, 1, 2, 2, 1, 1, 2, 2]
    })

df.groupby(by="name")
df

这生成以下结果:

    name	col_1	col_2

0 Andy A1 1
1 Bob A2 1
2 Chad A3 2
3 Andy A4 2
4 Chad B1 1
5 Bob B2 1
6 George B3 2
7 Hank B4 2

但我需要它看起来像这样:

name col_1 col_2 col_1 col_2
0 Andy A1 1 A4 2
1 Bob A2 1 B2 1
2 Chad A3 2 B1 1
3 George B3 2
4 Hank B4 2

谢谢

英文:

How to group rows and add new columns.

See the example:

import pandas as pd

df = pd.DataFrame({
    'name': ['Andy', 'Bob', 'Chad', 'Andy', 'Chad', 'Bob', 'George', 'Hank'],
    'col_1': ['A1', 'A2', 'A3', 'A4', 'B1', 'B2', 'B3', 'B4'],
    'col_2': [1, 1, 2, 2, 1, 1, 2, 2]
    })

df.groupby(by="name")
df

This generates the following result:

        name	col_1	col_2
   0	Andy	A1	    1
   1	Bob	    A2	    1
   2	Chad	A3	    2
   3	Andy	A4	    2
   4	Chad	B1	    1
   5	Bob	    B2	    1
   6	George	B3	    2
   7	Hank	B4	    2

But I need it to look like this:

  name    col_1   col_2    col_1   col_2
0 Andy    A1      1        A4      2
1 Bob     A2      1        B2      1
2 Chad    A3      2        B1      1 
3 George  B3      2        
4 Hank    B4      2

Thanks

答案1

得分: 1

尝试:

df['col'] = df.groupby('name').cumcount()
out = df.pivot(index='name', columns='col').swaplevel(axis=1).sort_index(axis=1).fillna('')
out.columns = (f'{b}_{a}' for a, b in out.columns)

print(out)

输出:

       col_0_1  col_0_2 col_1_1 col_1_2
name                                   
Andy        A1      1.0      A4     2.0
Bob         A2      1.0      B2     1.0
Chad        A3      2.0      B1     1.0
George      B3      2.0                
Hank        B4      2.0                
英文:

Try:

df['col'] = df.groupby('name').cumcount()
out = df.pivot(index='name', columns='col').swaplevel(axis=1).sort_index(axis=1).fillna('')
out.columns = (f'{b}_{a}' for a, b in out.columns)

print(out)

Prints:

       col_1_0  col_2_0 col_1_1 col_2_1
name                                   
Andy        A1      1.0      A4     2.0
Bob         A2      1.0      B2     1.0
Chad        A3      2.0      B1     1.0
George      B3      2.0                
Hank        B4      2.0                

huangapple
  • 本文由 发表于 2023年6月16日 06:02:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76485784.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定