将数据框从长格式转换为宽格式。

huangapple go评论95阅读模式
英文:

Reshape dataframe from long to wide

问题

我的DataFrame:

  1. d = {'project_id': [19, 20, 19, 20, 19, 20],
  2. 'task_id': [11, 22, 11, 22, 11, 22],
  3. "task": ["task_1", "task_1", "task_1", "task_1", "task_1", "task_1"],
  4. "username": ["tom", "jery", "tom", "jery", "tom", "jery"],
  5. "image_id": [101, 202, 303, 404, 505, 606],
  6. "frame": [0, 0, 9, 8, 11, 11],
  7. "label": ['foo', 'foo', 'bar', 'xyz', 'bar', 'bar']}
  8. df = pd.DataFrame(data=d)

我的DataFrame是长格式的,其中有一些重复,只有image_id是唯一的。我试图使用pd.pivotpd.merge将它重塑为宽格式,以username为基础。

我的代码:

  1. pd.pivot(df, index=['task', 'frame', 'image_id'], columns='username', values='label')

我的输出:
将数据框从长格式转换为宽格式。

我期望(或希望达到)的结果:
将数据框从长格式转换为宽格式。

所以,正如您所见,我实际上不需要在输出中包括image_id。只需要总结每个用户在每个帧上使用的标签。

英文:

my df:

  1. d = {'project_id': [19,20,19,20,19,20],
  2. 'task_id': [11,22,11,22,11,22],
  3. "task": ["task_1","task_1","task_1","task_1","task_1","task_1"],
  4. "username": ["tom","jery","tom","jery","tom","jery"],
  5. "image_id":[101,202,303,404,505,606],
  6. "frame":[0,0,9,8,11,11],
  7. "label":['foo','foo','bar','xyz','bar','bar']}
  8. df = pd.DataFrame(data=d)

So my df, is long format, in some duplicate and only image_id is unique.
I trying pivot my df, with pd.pivot and pd.merge reshape to wide format by username.
My code:

  1. pd.pivot(df, index=['task','frame','image_id'], columns = 'username', values='label')

My output:
将数据框从长格式转换为宽格式。

I expected(or want to reach):
将数据框从长格式转换为宽格式。

So, as you see, I don't really need image_id in my output. Just summary, which user use tag per frame.

答案1

得分: 1

你可以在pivot之后添加 groupby.first

  1. (pd.pivot(df, index=['task','frame','image_id'],
  2. columns='username', values='label')
  3. .groupby(level=['task','frame']).first()
  4. )

或者使用 pivot_table,并且设置 aggfunc='first'

  1. pd.pivot_table(df, index=['task','frame'],
  2. columns='username', values='label',
  3. aggfunc='first')

输出结果:

  1. username jery tom
  2. task frame
  3. task_1 0 foo foo
  4. 8 xyz None
  5. 9 None bar
  6. 11 bar bar
英文:

You can add a groupby.first after the pivot:

  1. (pd.pivot(df, index=['task','frame','image_id'],
  2. columns='username', values='label')
  3. .groupby(level=['task','frame']).first()
  4. )

Or use pivot_table with aggfunc='first':

  1. pd.pivot_table(df, index=['task','frame'],
  2. columns='username', values='label',
  3. aggfunc='first')

Output:

  1. username jery tom
  2. task frame
  3. task_1 0 foo foo
  4. 8 xyz None
  5. 9 None bar
  6. 11 bar bar

huangapple
  • 本文由 发表于 2023年2月6日 20:23:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75361276.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定