数据框转换在Python中

huangapple go评论99阅读模式
英文:

Dataframe transform in Python

问题

我正在尝试在Python中转换一个数据帧 - 愿意使用Pandas或NumPy如果它能完成任务

原始数据帧如下所示

A        B        D     E 
foo-1    bar-6    C1    11
foo-2    bar-5    C2    12
foo-3    bar-4    C1    13
foo-4    bar-3    C1    14
foo-5    bar-2    C2    15
foo-6    bar-1    C2    16

而我正在尝试将其转换为这个

A        B        C1    C2
foo-1    bar-6    11    NAN
foo-2    bar-5    NAN   12
foo-3    bar-4    13    NAN
foo-4    bar-3    14    NAN
foo-5    bar-2    NAN   15
foo-6    bar-1    NAN   16

或者这样,然后我会删除D和E列

A        B        D    E    C1    C2
foo-1    bar-6    C1   11   11    NAN
foo-2    bar-5    C2   12   NAN   12
foo-3    bar-4    C1   13   13    NAN
foo-4    bar-3    C1   14   14    NAN
foo-5    bar-2    C2   15   NAN   15
foo-6    bar-1    C2   16   NAN   16

我尝试过这个


for row in dataframe.index:
dataframe[dataframe[D]] = dataframe[E]

但我得到了错误的结果

英文:

I am trying to transform a dataframe in Python - happy to use Pandas or NumPy if it will do the job

The orginal dataframe looks like this

A        B        D     E 
foo-1    bar-6    C1    11
foo-2    bar-5    C2    12
foo-3    bar-4    C1    13
foo-4    bar-3    C1    14
foo-5    bar-2    C2    15
foo-6    bar-1    C2    16

And I am trying to transform it into this

A        B        C1    C2
foo-1    bar-6    11    NAN
foo-2    bar-5    NAN   12
foo-3    bar-4    13    NAN
foo-4    bar-3    14    NAN
foo-5    bar-2    NAN   15
foo-6    bar-1    NAN   16

or this then I will drop cols D & E

A        B        D    E    C1    C2
foo-1    bar-6    C1   11   11    NAN
foo-2    bar-5    C2   12   NAN   12
foo-3    bar-4    C1   13   13    NAN
foo-4    bar-3    C1   14   14    NAN
foo-5    bar-2    C2   15   NAN   15
foo-6    bar-1    C2   16   NAN   16

I have tried this


for row in dataframe.index:
dataframe\[dataframe\[D\]\] = dataframe\[E\]

but I get the wrong results

答案1

得分: 1

尝试将原始数据框的一部分进行透视,然后将其与原数据框连接:

out = df.join(pd.pivot(df[['D', 'E']], columns='D', values='E'))
print(out)

打印结果:

       A      B   D   E    C1    C2
0  foo-1  bar-6  C1  11  11.0   NaN
1  foo-2  bar-5  C2  12   NaN  12.0
2  foo-3  bar-4  C1  13  13.0   NaN
3  foo-4  bar-3  C1  14  14.0   NaN
4  foo-5  bar-2  C2  15   NaN  15.0
5  foo-6  bar-1  C2  16   NaN  16.0
英文:

Try to pivot part of the original dataframe then join it back:

out = df.join(pd.pivot(df[['D', 'E']], columns='D', values='E'))
print(out)

Prints:

       A      B   D   E    C1    C2
0  foo-1  bar-6  C1  11  11.0   NaN
1  foo-2  bar-5  C2  12   NaN  12.0
2  foo-3  bar-4  C1  13  13.0   NaN
3  foo-4  bar-3  C1  14  14.0   NaN
4  foo-5  bar-2  C2  15   NaN  15.0
5  foo-6  bar-1  C2  16   NaN  16.0

答案2

得分: 1

以下是翻译好的部分:

这个问题经常出现,因为这个操作的名称并不明显。其中一个称呼是数据透视表。它也是堆叠操作的反操作。因此,您可以像@ScottBenson的回答中那样使用unstack,或者使用DataFrame.pivot方法。

df.pivot(index=['A', 'B'], columns='D', values='E')

输出

D              C1    C2
A     B                
foo-1 bar-6  11.0   NaN
foo-2 bar-5   NaN  12.0
foo-3 bar-4  13.0   NaN
foo-4 bar-3  14.0   NaN
foo-5 bar-2   NaN  15.0
foo-6 bar-1   NaN  16.0
英文:

This question comes up a lot because it is not obvious what this operation is called. One word for it is a pivot table. It is also the opposite of the stack operation. So, you can use unstack as in the answer by @ScottBenson or the DataFrame.pivot method.

df.pivot(index=['A', 'B'], columns='D', values='E')

Output

D              C1    C2
A     B                
foo-1 bar-6  11.0   NaN
foo-2 bar-5   NaN  12.0
foo-3 bar-4  13.0   NaN
foo-4 bar-3  14.0   NaN
foo-5 bar-2   NaN  15.0
foo-6 bar-1   NaN  16.0

答案3

得分: 0

这是一个相当简单的解决方案。如果您有任何问题,请告诉我 (:

data = {
    'A': ['foo-1', 'foo-2', 'foo-3', 'foo-4', 'foo-5', 'foo-6'],
    'B': ['bar-6', 'bar-5', 'bar-4', 'bar-3', 'bar-2', 'bar-1'],
    'D': ['C1', 'C2', 'C1', 'C1', 'C2', 'C2'],
    'E': [11, 12, 13, 14, 15, 16]
}
df = pd.DataFrame(data)
column_index = [0, 1, 2, 3, 4, 5]
for (a, b, c) in zip(df['D'], df['E'], column_index):
    if df['D'][c] == 'C1':
        df['E'][c] = 'NAN'
        df['D'][c] = b
    else:
        df['E'][c] = b
        df['D'][c] = 'NAN'
df.columns = ['A', 'B', 'C1', 'C2']
print(df)
A B C1 C2
0 foo-1 bar-6 11 nan
1 foo-2 bar-5 nan 12
2 foo-3 bar-4 13 nan
3 foo-4 bar-3 14 nan
4 foo-5 bar-2 nan 15
5 foo-6 bar-1 nan 16
英文:

Here is a fairly simple solution. Let me know if you have any questions (:

data = {
'A': ['foo-1', 'foo-2', 'foo-3', 'foo-4', 'foo-5', 'foo-6'],
'B': ['bar-6', 'bar-5', 'bar-4', 'bar-3', 'bar-2', 'bar-1'],
'D': ['C1', 'C2', 'C1', 'C1', 'C2', 'C2'],
'E': [11, 12, 13, 14, 15, 16]
}
df = pd.DataFrame(data)
column_index = [0,1,2,3,4,5]
for (a,b,c) in zip(df['D'], df['E'], column_index):
    if df['D'][c] == 'C1':
        df['E'][c] = 'NAN
        df['D'][c] = b
    else:
        df['E'][c] = b
        df['D'][c] = 'NAN
df.columns = ['A', 'B', 'C1', 'C2']
print(df)
A B C1 C2
0 foo-1 bar-6 11 nan
1 foo-2 bar-5 nan 12
2 foo-3 bar-4 13 nan
3 foo-4 bar-3 14 nan
4 foo-5 bar-2 nan 15
5 foo-6 bar-1 nan 16

huangapple
  • 本文由 发表于 2023年6月13日 07:58:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76460938.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定