将行类别转换为列,同时保留数据框的其余部分,使用Python。

huangapple go评论76阅读模式
英文:

Transform rows categories to column while preserving rest of the data frame python

问题

我有以下数据框:

时间 分组 实体 GC Seg 类别 年 季度 指标名称 价值
0 2021-06-01 KRO CO P_GA None Model_Q2_2021 2021 2 yhat 568759.481223
1 2021-07-01 KRO CO P_GA None Model_Q2_2021 2021 3 yhat 586003.965652
2 2021-08-01 KRO CO P_GA None Model_Q2_2021 2021 3 yhat 583703.420655
3 2021-09-01 KRO CO P_GA None Model_Q2_2021 2021 3 y 608601.857510
4 2021-10-01 KRO CO P_GA None Model_Q2_2021 2021 4 y 628928.602344

我想将“指标名称”类别转换为列,以便将它们与其他列的相应值相加。

我尝试了“pivot”和“melt”,但没有得到我想要的结果。

我最接近的尝试是这样的:

grouper = df.groupby('IndicatorName')
out = pd.concat([pd.Series(v['Value'].tolist(), name=k) for k, v in grouper], axis=1)

但我希望保留所有列,并额外添加“yhat”和“y”列。

任何帮助/建议将不胜感激。

谢谢!

英文:

I have data frame as below

	Time  	Groups	Entity	GC	Seg	Category	Year	Quarter	IndicatorName	Value
0	2021-06-01	KRO	CO	P_GA	None	Model_Q2_2021	2021	2	        yhat	568759.481223
1	2021-07-01	KRO	CO	P_GA	None	Model_Q2_2021	2021	3	        yhat	586003.965652
2	2021-08-01	KRO	CO	P_GA	None	Model_Q2_2021	2021	3	        yhat	583703.420655
3	2021-09-01	KRO	CO	P_GA	None	Model_Q2_2021	2021	3	        y	    608601.857510
4	2021-10-01	KRO	CO	P_GA	None	Model_Q2_2021	2021	4	        y	    628928.602344

I want to IndicatorName categories to make them columns in such a way that the corresponding value to them in addtion to the rest of the columns

I tried pivot, and melt but nothing gave me desired results.

The closed I have gone was with this

grouper = df.groupby('IndicatorName')
out = pd.concat([pd.Series(v['Value'].tolist(), name=k) for k, v in grouper], axis=1)

         y  	yhat
0	8626.88	5.687595e+05
1	8215.30	5.860040e+05
2	8601.53	5.837034e+05
3	8145.16	6.086019e+05
4	9376.81	6.289286e+05
...	...	...
744	NaN	5.402358e+06
745	NaN	5.796123e+06
746	NaN	5.218829e+06
747	NaN	5.451504e+06

But I want to have all columns preserved and additional columns yhat and y

Any help/suggestion would be much appreciated.

Thanks in advance!

答案1

得分: 2

你是否正在寻找 pivot_table

>>> (df.pivot_table(index=df.columns[:-2].tolist(), columns=['IndicatorName'], values='Value')
   .reset_index().rename_axis(columns=None))

        Time Groups Entity    GC   Seg       Category  Year  Quarter              y           yhat
0 2021-06-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        2            NaN  568759.481223
1 2021-07-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3            NaN  586003.965652
2 2021-08-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3            NaN  583703.420655
3 2021-09-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3  608601.857510            NaN
4 2021-10-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        4  628928.602344            NaN

如果您想保留 IndicatorName 列,您可以将 [:-2] 更改为 [:-1]

英文:

Are you looking for pivot_table:

>>> (df.pivot_table(index=df.columns[:-2].tolist(), columns=['IndicatorName'], values='Value')
       .reset_index().rename_axis(columns=None))

        Time Groups Entity    GC   Seg       Category  Year  Quarter              y           yhat
0 2021-06-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        2            NaN  568759.481223
1 2021-07-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3            NaN  586003.965652
2 2021-08-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3            NaN  583703.420655
3 2021-09-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3  608601.857510            NaN
4 2021-10-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        4  628928.602344            NaN

You can change [:-2] by [:-1] if you want to keep the IndicatorName column.

答案2

得分: 0

添加您想要的两列,并根据df['IndicatorName']的值填充它们的值:

df['yhat'] = df['Value'] * (df['IndicatorName'] == 'yhat')
df['y'] = df['Value'] * (df['IndicatorName'] == 'y')

这应该会得到以下结果:

        Time        (...) IndicatorName        Value      yhat         y
    0   2021-06-01  (...)          yhat    568759.48  568759.48         0
    1   2021-07-01  (...)          yhat    586003.96  586003.96         0
    2   2021-08-01  (...)          yhat    583703.42  583703.42         0
    3   2021-09-01  (...)             y    608601.85         0   608601.85
    4   2021-10-01  (...)             y    628928.60         0   628928.60
英文:

add the two columns you want, and fill them with the value of df['value'] with a condition based on the value of df['IndicatorName']:

df['yhat'] = df['Value']*(df['IndicatorName'] == 'yhat')
df['y'] = df['Value']*(df['IndicatorName'] == 'y')

should get you:

    Time        (...) IndicatorName        Value      yhat         y
0   2021-06-01  (...)          yhat    568759.48 568759.48         0
1   2021-07-01  (...)          yhat    586003.96 586003.96         0
2   2021-08-01  (...)          yhat    583703.42 583703.42         0
3   2021-09-01  (...)             y    608601.85         0 608601.85
4   2021-10-01  (...)             y    628928.60         0 628928.60

huangapple
  • 本文由 发表于 2023年2月8日 21:13:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75386327.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定