英文:
Reshape dataframe from long to wide
问题
我的DataFrame:
d = {'project_id': [19, 20, 19, 20, 19, 20],
     'task_id': [11, 22, 11, 22, 11, 22],
     "task": ["task_1", "task_1", "task_1", "task_1", "task_1", "task_1"],
     "username": ["tom", "jery", "tom", "jery", "tom", "jery"],
     "image_id": [101, 202, 303, 404, 505, 606],
     "frame": [0, 0, 9, 8, 11, 11],
     "label": ['foo', 'foo', 'bar', 'xyz', 'bar', 'bar']}
df = pd.DataFrame(data=d)
我的DataFrame是长格式的,其中有一些重复,只有image_id是唯一的。我试图使用pd.pivot和pd.merge将它重塑为宽格式,以username为基础。
我的代码:
pd.pivot(df, index=['task', 'frame', 'image_id'], columns='username', values='label')
所以,正如您所见,我实际上不需要在输出中包括image_id。只需要总结每个用户在每个帧上使用的标签。
英文:
my df:
d = {'project_id': [19,20,19,20,19,20], 
     'task_id': [11,22,11,22,11,22], 
     "task": ["task_1","task_1","task_1","task_1","task_1","task_1"], 
     "username": ["tom","jery","tom","jery","tom","jery"],
     "image_id":[101,202,303,404,505,606],
     "frame":[0,0,9,8,11,11],
     "label":['foo','foo','bar','xyz','bar','bar']} 
df = pd.DataFrame(data=d) 
So my df, is long format, in some duplicate and only image_id is unique.
I trying pivot my df, with pd.pivot and pd.merge reshape to wide format by username.
My code:
pd.pivot(df, index=['task','frame','image_id'], columns = 'username', values='label')
So, as you see, I don't really need image_id in my output. Just summary, which user use tag per frame.
答案1
得分: 1
你可以在pivot之后添加 groupby.first:
(pd.pivot(df, index=['task','frame','image_id'],
          columns='username', values='label')
   .groupby(level=['task','frame']).first()
)
或者使用 pivot_table,并且设置 aggfunc='first':
pd.pivot_table(df, index=['task','frame'],
               columns='username', values='label',
               aggfunc='first')
输出结果:
username      jery   tom
task   frame            
task_1 0       foo   foo
       8       xyz  None
       9      None   bar
       11      bar   bar
英文:
You can add a groupby.first after the pivot:
(pd.pivot(df, index=['task','frame','image_id'],
          columns='username', values='label')
   .groupby(level=['task','frame']).first()
)
Or use pivot_table with aggfunc='first':
pd.pivot_table(df, index=['task','frame'],
               columns='username', values='label',
               aggfunc='first')   
Output:
username      jery   tom
task   frame            
task_1 0       foo   foo
       8       xyz  None
       9      None   bar
       11      bar   bar
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。




评论