2023年2月8日 21:13:34go评论95阅读模式

英文:

Transform rows categories to column while preserving rest of the data frame python

问题

我有以下数据框：

时间分组实体 GC Seg 类别年季度指标名称价值
0 2021-06-01 KRO CO P_GA None Model_Q2_2021 2021 2 yhat 568759.481223
1 2021-07-01 KRO CO P_GA None Model_Q2_2021 2021 3 yhat 586003.965652
2 2021-08-01 KRO CO P_GA None Model_Q2_2021 2021 3 yhat 583703.420655
3 2021-09-01 KRO CO P_GA None Model_Q2_2021 2021 3 y 608601.857510
4 2021-10-01 KRO CO P_GA None Model_Q2_2021 2021 4 y 628928.602344

我想将“指标名称”类别转换为列，以便将它们与其他列的相应值相加。

我尝试了“pivot”和“melt”，但没有得到我想要的结果。

我最接近的尝试是这样的：

grouper = df.groupby('IndicatorName')
out = pd.concat([pd.Series(v['Value'].tolist(), name=k) for k, v in grouper], axis=1)

但我希望保留所有列，并额外添加“yhat”和“y”列。

任何帮助/建议将不胜感激。

谢谢！

英文:

I have data frame as below

	Time  	Groups	Entity	GC	Seg	Category	Year	Quarter	IndicatorName	Value
0	2021-06-01	KRO	CO	P_GA	None	Model_Q2_2021	2021	2	        yhat	568759.481223
1	2021-07-01	KRO	CO	P_GA	None	Model_Q2_2021	2021	3	        yhat	586003.965652
2	2021-08-01	KRO	CO	P_GA	None	Model_Q2_2021	2021	3	        yhat	583703.420655
3	2021-09-01	KRO	CO	P_GA	None	Model_Q2_2021	2021	3	        y	    608601.857510
4	2021-10-01	KRO	CO	P_GA	None	Model_Q2_2021	2021	4	        y	    628928.602344

I want to IndicatorName categories to make them columns in such a way that the corresponding value to them in addtion to the rest of the columns

I tried pivot, and melt but nothing gave me desired results.

The closed I have gone was with this

grouper = df.groupby(&#39;IndicatorName&#39;)
out = pd.concat([pd.Series(v[&#39;Value&#39;].tolist(), name=k) for k, v in grouper], axis=1)
         y  	yhat
0	8626.88	5.687595e+05
1	8215.30	5.860040e+05
2	8601.53	5.837034e+05
3	8145.16	6.086019e+05
4	9376.81	6.289286e+05
...	...	...
744	NaN	5.402358e+06
745	NaN	5.796123e+06
746	NaN	5.218829e+06
747	NaN	5.451504e+06

But I want to have all columns preserved and additional columns yhat and y

Any help/suggestion would be much appreciated.

Thanks in advance!

答案1

得分: 2

你是否正在寻找 pivot_table：

>>> (df.pivot_table(index=df.columns[:-2].tolist(), columns=['IndicatorName'], values='Value')
   .reset_index().rename_axis(columns=None))
        Time Groups Entity    GC   Seg       Category  Year  Quarter              y           yhat
0 2021-06-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        2            NaN  568759.481223
1 2021-07-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3            NaN  586003.965652
2 2021-08-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3            NaN  583703.420655
3 2021-09-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3  608601.857510            NaN
4 2021-10-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        4  628928.602344            NaN

如果您想保留 IndicatorName 列，您可以将 [:-2] 更改为 [:-1]。

英文:

Are you looking for pivot_table:

&gt;&gt;&gt; (df.pivot_table(index=df.columns[:-2].tolist(), columns=[&#39;IndicatorName&#39;], values=&#39;Value&#39;)
       .reset_index().rename_axis(columns=None))
        Time Groups Entity    GC   Seg       Category  Year  Quarter              y           yhat
0 2021-06-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        2            NaN  568759.481223
1 2021-07-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3            NaN  586003.965652
2 2021-08-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3            NaN  583703.420655
3 2021-09-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        3  608601.857510            NaN
4 2021-10-01    KRO     CO  P_GA  None  Model_Q2_2021  2021        4  628928.602344            NaN

You can change [:-2] by [:-1] if you want to keep the IndicatorName column.

答案2

得分: 0

添加您想要的两列，并根据df['IndicatorName']的值填充它们的值：

df['yhat'] = df['Value'] * (df['IndicatorName'] == 'yhat')
df['y'] = df['Value'] * (df['IndicatorName'] == 'y')

这应该会得到以下结果：

        Time        (...) IndicatorName        Value      yhat         y
    0   2021-06-01  (...)          yhat    568759.48  568759.48         0
    1   2021-07-01  (...)          yhat    586003.96  586003.96         0
    2   2021-08-01  (...)          yhat    583703.42  583703.42         0
    3   2021-09-01  (...)             y    608601.85         0   608601.85
    4   2021-10-01  (...)             y    628928.60         0   628928.60

英文:

add the two columns you want, and fill them with the value of df['value'] with a condition based on the value of df['IndicatorName']:

df[&#39;yhat&#39;] = df[&#39;Value&#39;]*(df[&#39;IndicatorName&#39;] == &#39;yhat&#39;)
df[&#39;y&#39;] = df[&#39;Value&#39;]*(df[&#39;IndicatorName&#39;] == &#39;y&#39;)

should get you:

    Time        (...) IndicatorName        Value      yhat         y
0   2021-06-01  (...)          yhat    568759.48 568759.48         0
1   2021-07-01  (...)          yhat    586003.96 586003.96         0
2   2021-08-01  (...)          yhat    583703.42 583703.42         0
3   2021-09-01  (...)             y    608601.85         0 608601.85
4   2021-10-01  (...)             y    628928.60         0 628928.60

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将行类别转换为列，同时保留数据框的其余部分，使用Python。

问题

答案1

答案2

多行字符串包含数字

没有这样的文件或目录：’requirements.txt’

DataFrame列表的列表会覆盖先前的值 (pandas, python)

与Docker容器在Bash脚本执行中进行交互 [在容器内部]

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。