数据框转换在Python中

huangapple go评论162阅读模式
英文:

Dataframe transform in Python

问题

我正在尝试在Python中转换一个数据帧 - 愿意使用Pandas或NumPy如果它能完成任务

原始数据帧如下所示

  1. A        B        D     E 
  2. foo-1    bar-6    C1    11
  3. foo-2    bar-5    C2    12
  4. foo-3    bar-4    C1    13
  5. foo-4    bar-3    C1    14
  6. foo-5    bar-2    C2    15
  7. foo-6    bar-1    C2    16

而我正在尝试将其转换为这个

  1. A        B        C1    C2
  2. foo-1    bar-6    11    NAN
  3. foo-2    bar-5    NAN   12
  4. foo-3    bar-4    13    NAN
  5. foo-4    bar-3    14    NAN
  6. foo-5    bar-2    NAN   15
  7. foo-6    bar-1    NAN   16

或者这样,然后我会删除D和E列

  1. A        B        D    E    C1    C2
  2. foo-1    bar-6    C1   11   11    NAN
  3. foo-2    bar-5    C2   12   NAN   12
  4. foo-3    bar-4    C1   13   13    NAN
  5. foo-4    bar-3    C1   14   14    NAN
  6. foo-5    bar-2    C2   15   NAN   15
  7. foo-6    bar-1    C2   16   NAN   16

我尝试过这个

  1. for row in dataframe.index:
  2. dataframe[dataframe[D]] = dataframe[E]

但我得到了错误的结果

英文:

I am trying to transform a dataframe in Python - happy to use Pandas or NumPy if it will do the job

The orginal dataframe looks like this

  1. A B D E
  2. foo-1 bar-6 C1 11
  3. foo-2 bar-5 C2 12
  4. foo-3 bar-4 C1 13
  5. foo-4 bar-3 C1 14
  6. foo-5 bar-2 C2 15
  7. foo-6 bar-1 C2 16

And I am trying to transform it into this

  1. A B C1 C2
  2. foo-1 bar-6 11 NAN
  3. foo-2 bar-5 NAN 12
  4. foo-3 bar-4 13 NAN
  5. foo-4 bar-3 14 NAN
  6. foo-5 bar-2 NAN 15
  7. foo-6 bar-1 NAN 16

or this then I will drop cols D & E

  1. A B D E C1 C2
  2. foo-1 bar-6 C1 11 11 NAN
  3. foo-2 bar-5 C2 12 NAN 12
  4. foo-3 bar-4 C1 13 13 NAN
  5. foo-4 bar-3 C1 14 14 NAN
  6. foo-5 bar-2 C2 15 NAN 15
  7. foo-6 bar-1 C2 16 NAN 16

I have tried this

  1. for row in dataframe.index:
  2. dataframe\[dataframe\[D\]\] = dataframe\[E\]

but I get the wrong results

答案1

得分: 1

尝试将原始数据框的一部分进行透视,然后将其与原数据框连接:

  1. out = df.join(pd.pivot(df[['D', 'E']], columns='D', values='E'))
  2. print(out)

打印结果:

  1. A B D E C1 C2
  2. 0 foo-1 bar-6 C1 11 11.0 NaN
  3. 1 foo-2 bar-5 C2 12 NaN 12.0
  4. 2 foo-3 bar-4 C1 13 13.0 NaN
  5. 3 foo-4 bar-3 C1 14 14.0 NaN
  6. 4 foo-5 bar-2 C2 15 NaN 15.0
  7. 5 foo-6 bar-1 C2 16 NaN 16.0
英文:

Try to pivot part of the original dataframe then join it back:

  1. out = df.join(pd.pivot(df[['D', 'E']], columns='D', values='E'))
  2. print(out)

Prints:

  1. A B D E C1 C2
  2. 0 foo-1 bar-6 C1 11 11.0 NaN
  3. 1 foo-2 bar-5 C2 12 NaN 12.0
  4. 2 foo-3 bar-4 C1 13 13.0 NaN
  5. 3 foo-4 bar-3 C1 14 14.0 NaN
  6. 4 foo-5 bar-2 C2 15 NaN 15.0
  7. 5 foo-6 bar-1 C2 16 NaN 16.0

答案2

得分: 1

以下是翻译好的部分:

这个问题经常出现,因为这个操作的名称并不明显。其中一个称呼是数据透视表。它也是堆叠操作的反操作。因此,您可以像@ScottBenson的回答中那样使用unstack,或者使用DataFrame.pivot方法。

  1. df.pivot(index=['A', 'B'], columns='D', values='E')

输出

  1. D C1 C2
  2. A B
  3. foo-1 bar-6 11.0 NaN
  4. foo-2 bar-5 NaN 12.0
  5. foo-3 bar-4 13.0 NaN
  6. foo-4 bar-3 14.0 NaN
  7. foo-5 bar-2 NaN 15.0
  8. foo-6 bar-1 NaN 16.0
英文:

This question comes up a lot because it is not obvious what this operation is called. One word for it is a pivot table. It is also the opposite of the stack operation. So, you can use unstack as in the answer by @ScottBenson or the DataFrame.pivot method.

  1. df.pivot(index=['A', 'B'], columns='D', values='E')

Output

  1. D C1 C2
  2. A B
  3. foo-1 bar-6 11.0 NaN
  4. foo-2 bar-5 NaN 12.0
  5. foo-3 bar-4 13.0 NaN
  6. foo-4 bar-3 14.0 NaN
  7. foo-5 bar-2 NaN 15.0
  8. foo-6 bar-1 NaN 16.0

答案3

得分: 0

这是一个相当简单的解决方案。如果您有任何问题,请告诉我 (:

  1. data = {
  2. 'A': ['foo-1', 'foo-2', 'foo-3', 'foo-4', 'foo-5', 'foo-6'],
  3. 'B': ['bar-6', 'bar-5', 'bar-4', 'bar-3', 'bar-2', 'bar-1'],
  4. 'D': ['C1', 'C2', 'C1', 'C1', 'C2', 'C2'],
  5. 'E': [11, 12, 13, 14, 15, 16]
  6. }
  7. df = pd.DataFrame(data)
  8. column_index = [0, 1, 2, 3, 4, 5]
  9. for (a, b, c) in zip(df['D'], df['E'], column_index):
  10. if df['D'][c] == 'C1':
  11. df['E'][c] = 'NAN'
  12. df['D'][c] = b
  13. else:
  14. df['E'][c] = b
  15. df['D'][c] = 'NAN'
  16. df.columns = ['A', 'B', 'C1', 'C2']
  17. print(df)
A B C1 C2
0 foo-1 bar-6 11 nan
1 foo-2 bar-5 nan 12
2 foo-3 bar-4 13 nan
3 foo-4 bar-3 14 nan
4 foo-5 bar-2 nan 15
5 foo-6 bar-1 nan 16
英文:

Here is a fairly simple solution. Let me know if you have any questions (:

  1. data = {
  2. 'A': ['foo-1', 'foo-2', 'foo-3', 'foo-4', 'foo-5', 'foo-6'],
  3. 'B': ['bar-6', 'bar-5', 'bar-4', 'bar-3', 'bar-2', 'bar-1'],
  4. 'D': ['C1', 'C2', 'C1', 'C1', 'C2', 'C2'],
  5. 'E': [11, 12, 13, 14, 15, 16]
  6. }
  7. df = pd.DataFrame(data)
  8. column_index = [0,1,2,3,4,5]
  9. for (a,b,c) in zip(df['D'], df['E'], column_index):
  10. if df['D'][c] == 'C1':
  11. df['E'][c] = 'NAN
  12. df['D'][c] = b
  13. else:
  14. df['E'][c] = b
  15. df['D'][c] = 'NAN
  16. df.columns = ['A', 'B', 'C1', 'C2']
  17. print(df)
A B C1 C2
0 foo-1 bar-6 11 nan
1 foo-2 bar-5 nan 12
2 foo-3 bar-4 13 nan
3 foo-4 bar-3 14 nan
4 foo-5 bar-2 nan 15
5 foo-6 bar-1 nan 16

huangapple
  • 本文由 发表于 2023年6月13日 07:58:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76460938.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定