将行类别转换为列,同时保留数据框的其余部分,使用Python。

huangapple go评论95阅读模式
英文:

Transform rows categories to column while preserving rest of the data frame python

问题

我有以下数据框:

时间 分组 实体 GC Seg 类别 年 季度 指标名称 价值
0 2021-06-01 KRO CO P_GA None Model_Q2_2021 2021 2 yhat 568759.481223
1 2021-07-01 KRO CO P_GA None Model_Q2_2021 2021 3 yhat 586003.965652
2 2021-08-01 KRO CO P_GA None Model_Q2_2021 2021 3 yhat 583703.420655
3 2021-09-01 KRO CO P_GA None Model_Q2_2021 2021 3 y 608601.857510
4 2021-10-01 KRO CO P_GA None Model_Q2_2021 2021 4 y 628928.602344

我想将“指标名称”类别转换为列,以便将它们与其他列的相应值相加。

我尝试了“pivot”和“melt”,但没有得到我想要的结果。

我最接近的尝试是这样的:

  1. grouper = df.groupby('IndicatorName')
  2. out = pd.concat([pd.Series(v['Value'].tolist(), name=k) for k, v in grouper], axis=1)

但我希望保留所有列,并额外添加“yhat”和“y”列。

任何帮助/建议将不胜感激。

谢谢!

英文:

I have data frame as below

  1. Time Groups Entity GC Seg Category Year Quarter IndicatorName Value
  2. 0 2021-06-01 KRO CO P_GA None Model_Q2_2021 2021 2 yhat 568759.481223
  3. 1 2021-07-01 KRO CO P_GA None Model_Q2_2021 2021 3 yhat 586003.965652
  4. 2 2021-08-01 KRO CO P_GA None Model_Q2_2021 2021 3 yhat 583703.420655
  5. 3 2021-09-01 KRO CO P_GA None Model_Q2_2021 2021 3 y 608601.857510
  6. 4 2021-10-01 KRO CO P_GA None Model_Q2_2021 2021 4 y 628928.602344

I want to IndicatorName categories to make them columns in such a way that the corresponding value to them in addtion to the rest of the columns

I tried pivot, and melt but nothing gave me desired results.

The closed I have gone was with this

  1. grouper = df.groupby('IndicatorName')
  2. out = pd.concat([pd.Series(v['Value'].tolist(), name=k) for k, v in grouper], axis=1)
  3. y yhat
  4. 0 8626.88 5.687595e+05
  5. 1 8215.30 5.860040e+05
  6. 2 8601.53 5.837034e+05
  7. 3 8145.16 6.086019e+05
  8. 4 9376.81 6.289286e+05
  9. ... ... ...
  10. 744 NaN 5.402358e+06
  11. 745 NaN 5.796123e+06
  12. 746 NaN 5.218829e+06
  13. 747 NaN 5.451504e+06

But I want to have all columns preserved and additional columns yhat and y

Any help/suggestion would be much appreciated.

Thanks in advance!

答案1

得分: 2

你是否正在寻找 pivot_table

  1. >>> (df.pivot_table(index=df.columns[:-2].tolist(), columns=['IndicatorName'], values='Value')
  2. .reset_index().rename_axis(columns=None))
  3. Time Groups Entity GC Seg Category Year Quarter y yhat
  4. 0 2021-06-01 KRO CO P_GA None Model_Q2_2021 2021 2 NaN 568759.481223
  5. 1 2021-07-01 KRO CO P_GA None Model_Q2_2021 2021 3 NaN 586003.965652
  6. 2 2021-08-01 KRO CO P_GA None Model_Q2_2021 2021 3 NaN 583703.420655
  7. 3 2021-09-01 KRO CO P_GA None Model_Q2_2021 2021 3 608601.857510 NaN
  8. 4 2021-10-01 KRO CO P_GA None Model_Q2_2021 2021 4 628928.602344 NaN

如果您想保留 IndicatorName 列,您可以将 [:-2] 更改为 [:-1]

英文:

Are you looking for pivot_table:

  1. >>> (df.pivot_table(index=df.columns[:-2].tolist(), columns=['IndicatorName'], values='Value')
  2. .reset_index().rename_axis(columns=None))
  3. Time Groups Entity GC Seg Category Year Quarter y yhat
  4. 0 2021-06-01 KRO CO P_GA None Model_Q2_2021 2021 2 NaN 568759.481223
  5. 1 2021-07-01 KRO CO P_GA None Model_Q2_2021 2021 3 NaN 586003.965652
  6. 2 2021-08-01 KRO CO P_GA None Model_Q2_2021 2021 3 NaN 583703.420655
  7. 3 2021-09-01 KRO CO P_GA None Model_Q2_2021 2021 3 608601.857510 NaN
  8. 4 2021-10-01 KRO CO P_GA None Model_Q2_2021 2021 4 628928.602344 NaN

You can change [:-2] by [:-1] if you want to keep the IndicatorName column.

答案2

得分: 0

添加您想要的两列,并根据df['IndicatorName']的值填充它们的值:

  1. df['yhat'] = df['Value'] * (df['IndicatorName'] == 'yhat')
  2. df['y'] = df['Value'] * (df['IndicatorName'] == 'y')

这应该会得到以下结果:

  1. Time (...) IndicatorName Value yhat y
  2. 0 2021-06-01 (...) yhat 568759.48 568759.48 0
  3. 1 2021-07-01 (...) yhat 586003.96 586003.96 0
  4. 2 2021-08-01 (...) yhat 583703.42 583703.42 0
  5. 3 2021-09-01 (...) y 608601.85 0 608601.85
  6. 4 2021-10-01 (...) y 628928.60 0 628928.60
英文:

add the two columns you want, and fill them with the value of df['value'] with a condition based on the value of df['IndicatorName']:

  1. df['yhat'] = df['Value']*(df['IndicatorName'] == 'yhat')
  2. df['y'] = df['Value']*(df['IndicatorName'] == 'y')

should get you:

  1. Time (...) IndicatorName Value yhat y
  2. 0 2021-06-01 (...) yhat 568759.48 568759.48 0
  3. 1 2021-07-01 (...) yhat 586003.96 586003.96 0
  4. 2 2021-08-01 (...) yhat 583703.42 583703.42 0
  5. 3 2021-09-01 (...) y 608601.85 0 608601.85
  6. 4 2021-10-01 (...) y 628928.60 0 628928.60

huangapple
  • 本文由 发表于 2023年2月8日 21:13:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75386327.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定