如何将Pandas DataFrame 转换为相关矩阵的形状

huangapple go评论105阅读模式
英文:

How to convert Pandas Dataframe to the shape of a correlation matrix

问题

  1. 我有一个pandas数据框大致看起来像这样
  2. ```plaintext
  3. xvar yvar meanRsquared
  4. 0 filled_water precip 0.119730
  5. 1 filled_water snow 0.113214
  6. 2 filled_water filled_wetland 0.119529
  7. 3 filled_wetland precip 0.104826
  8. 4 filled_wetland snow 0.121540
  9. 5 filled_wetland filled_water 0.121540
  10. [676 rows x 3 columns]

我想将其形状转换为更传统的相关矩阵,其中列和索引是变量,而值是meanRsquared。

有没有简单的方法可以做到这一点?我已经试了一个小时了,但无法弄清楚如何做到这一点。

免责声明:是的,我知道pandas有一个用于创建相关矩阵的内置函数。但是,我的当前数据框是许多流域上数百个相关矩阵的平均值,所以我不能使用它。

这是我的最佳尝试,但显然逻辑在最后失败了。

  1. listOfdicts = []
  2. for xvar in df['xvar'].unique():
  3. for yvar in df['yvar'].unique():
  4. adict = {}
  5. adict['index'] = xvar
  6. adict[yvar] = yvar
  7. adict['r'] = df['insert r value here']
  8. listOfdicts.append(adict)
  9. answer = pd.DataFrame.from_dict(listOfdicts)

我不指望这会起作用,但这是我最好的尝试。

  1. <details>
  2. <summary>英文:</summary>
  3. I have a pandas dataframe which looks vaguely like this:

Out[130]:
xvar yvar meanRsquared
0 filled_water precip 0.119730
1 filled_water snow 0.113214
2 filled_water filled_wetland 0.119529
3 filled_wetland precip 0.104826
4 filled_wetland snow 0.121540
5 filled_wetland filled_water 0.121540
[676 rows x 3 columns]

  1. I would like to transform it&#39;s shape into a more traditional correlation matrix, where the columns and the index are the variables, and the values are the meanRsquared.
  2. Is there any easy way to do this? I&#39;ve been playing around for an hour and can&#39;t figure out how I could do this.
  3. DISCLAIMER: Yes, I know pandas has a built in function for creating a correlation matrix. However my current df is the average of hundreds of correlation matrices over many watersheds, so I cannot use that.
  4. This is my best attempt, but obviously the logic failed towards the end.

listOfdicts = []
for xvar in df['xvar'].unique():
for yvar in df['yvar'].unique():
adict = {}
adict['index'] = xvar
adict[yvar] = yvar
adict['r'] = df['insert r value here']
listOfdicts.append(adict)
answer = pd.Dataframe.from_dict(listOfdicts)

  1. I don&#39;t expect this to work, but this was my best shot.
  2. </details>
  3. # 答案1
  4. **得分**: 1
  5. 请查看透视方法(https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html)。
  6. ```python
  7. import pandas as pd
  8. df = pd.DataFrame(
  9. data={
  10. 'xvar': ['filled_water', 'filled_water', 'filled_water', 'filled_wetland', 'filled_wetland', 'filled_wetland'],
  11. 'yvar': ['precip', 'snow', 'filled_wetland', 'precip', 'snow', 'filled_water'],
  12. 'meanRsquared': [1, 2, 3, 4, 5, 6]
  13. }, index=range(6)
  14. )
  15. df.pivot(index='xvar', columns='yvar', values='meanRsquared')

输出:

  1. yvar filled_water filled_wetland precip snow
  2. xvar
  3. filled_water NaN 3.0 1.0 2.0
  4. filled_wetland 6.0 NaN 4.0 5.0
英文:

You need to look at pivot method (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html).

  1. import pandas as pd
  2. df =pd.DataFrame(
  3. data={
  4. &#39;xvar&#39;: [&#39;filled_water&#39;, &#39;filled_water&#39;, &#39;filled_water&#39;,
  5. &#39;filled_wetland&#39;, &#39;filled_wetland&#39;, &#39;filled_wetland&#39;],
  6. &#39;yvar&#39;:[&#39;precip&#39;,&#39;snow&#39;,&#39;filled_wetland&#39;,
  7. &#39;precip&#39;,&#39;snow&#39;,&#39;filled_water&#39; ],
  8. &#39;meanRsquared&#39;:[1,2,3,4,5,6]
  9. }, index=range(6)
  10. )
  11. df.pivot(index=&#39;xvar&#39;, columns=&#39;yvar&#39;, values=&#39;meanRsquared&#39;)

Output:

  1. yvar filled_water filled_wetland precip snow
  2. xvar
  3. filled_water NaN 3.0 1.0 2.0
  4. filled_wetland 6.0 NaN 4.0 5.0

huangapple
  • 本文由 发表于 2023年7月7日 06:56:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76632984.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定