将具有相同ID的行合并为同一行。

huangapple go评论103阅读模式
英文:

combine rows with the same IDs into the same row

问题

I understand your request. Here's the translated code part without any additional content:

  1. 我明白你的请求以下是翻译好的代码部分没有任何额外内容
  2. 我有一个包含600行的数据集数据有一个主要的ID=版本(Version)和第二个ID=任务(Task)数据如下所示
  3. 我想要更改格式使得属于同一个版本(Version)的任务(Task)在同一行中如下所示

请注意,这只是代码的翻译部分,不包含任何其他内容。如果需要更多帮助,请告诉我。

英文:

I have a dataset with 600 rows. the data has one main ID= Version and second ID= Task. Data looks like this:

  1. Version Task Concept Att 1 - Att 2 -
  2. 1 1 1 3 2
  3. 1 1 2 1 1
  4. 1 2 1 2 3
  5. 1 2 2 1 2
  6. 1 3 1 2 3
  7. 1 3 2 3 1
  8. 2 1 1 2 1
  9. 2 1 2 3 2
  10. 2 2 1 2 2
  11. 2 2 2 1 3
  12. 2 3 1 3 1
  13. 2 3 2 1 3

I would like to change the format, so to have "Task" which belongs to the same "Version" in the same row like this:

  1. Version Task Concept Att 1 - Att 2 - Version Task Concept Att 1 - Att 2 -
  2. 1 1 1 3 2 1 1 2 1 1
  3. 1 2 1 2 3 1 2 2 1 2
  4. 1 3 1 2 3 1 3 2 3 1
  5. 2 1 1 2 1 2 1 2 3 2
  6. 2 2 1 2 2 2 2 2 1 3
  7. 2 3 1 3 1 2 3 2 1 3

I have tried different things like groupby, pivot but I cannot find the right solution

答案1

得分: 0

I think a pivot is the clean way to reshape (df.pivot(index=['Version', 'Task'], columns='Concept'), optionally with flattening the columns MultiIndex).

That said if you really want to duplicate the columns, you could combine a groupby and concat:

  1. out = (pd.concat([g.set_index(['Version', 'Task'], drop=False)
  2. for k, g in df.groupby('Concept')], axis=1)
  3. .reset_index(drop=True)
  4. )

Output:

  1. Version Task Concept Att 1 - Att 2 - Version Task Concept Att 1 - Att 2 -
  2. 0 1 1 1 3 2 1 1 2 1 1
  3. 1 1 2 1 2 3 1 2 2 1 2
  4. 2 1 3 1 2 3 1 3 2 3 1
  5. 3 2 1 1 2 1 2 1 2 3 2
  6. 4 2 2 1 2 2 2 2 2 1 3
  7. 5 2 3 1 3 1 2 3 2 1 3
英文:

I think a pivot is the clean way to reshape (df.pivot(index=['Version', 'Task'], columns='Concept'), optionally with flattening the columns MultiIndex).

That said if you really want to duplicate the columns, you could combine a groupby and concat:

  1. out = (pd.concat([g.set_index(['Version', 'Task'], drop=False)
  2. for k, g in df.groupby('Concept')], axis=1)
  3. .reset_index(drop=True)
  4. )

Output:

  1. Version Task Concept Att 1 - Att 2 - Version Task Concept Att 1 - Att 2 -
  2. 0 1 1 1 3 2 1 1 2 1 1
  3. 1 1 2 1 2 3 1 2 2 1 2
  4. 2 1 3 1 2 3 1 3 2 3 1
  5. 3 2 1 1 2 1 2 1 2 3 2
  6. 4 2 2 1 2 2 2 2 2 1 3
  7. 5 2 3 1 3 1 2 3 2 1 3

huangapple
  • 本文由 发表于 2023年5月17日 19:32:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76271626.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定