Pandas dataframe 将所有非 NaN 值替换为特定列的值。

huangapple go评论51阅读模式
英文:

Pandas dataframe replace al non-nan values by a value of specific column

问题

我想将一个数据框转换成这样,所有不是NaN的值都替换为列'id'对应的值。

示例:

df = pd.DataFrame({'id': ['X', 'Y', 'Z'],
                   'A': [1, np.nan, 0],
                   'B': [0, 0, np.nan],
                   'C': [np.nan, 1, 1]})

Pandas dataframe 将所有非 NaN 值替换为特定列的值。

df = pd.DataFrame({'id': ['X', 'Y', 'Z'],
                   'A': ['X', np.nan, 'Z'],
                   'B': ['X', 'Y', np.nan],
                   'C': [np.nan, 'Y', 'Z']})

Pandas dataframe 将所有非 NaN 值替换为特定列的值。
使用循环遍历列和行索引来执行这个操作可能会在大型数据框上花费很长时间,所以我更喜欢使用pandas函数来解决这个问题。

英文:

I would like to transform a dataframe such that all values that are not nan are replaced with the corresponding value of the column 'id'.

Example:
From

df = pd.DataFrame({'id': ['X', 'Y', 'Z'],
                   'A': [1, np.nan,0],
                   'B': [0, 0, np.nan],
                   'C': [np.nan, 1, 1]})

<img src="https://i.stack.imgur.com/6vYAY.png" width="200" height="100"/>

to

df = pd.DataFrame({&#39;id&#39;: [&#39;X&#39;, &#39;Y&#39;, &#39;Z&#39;],
                   &#39;A&#39;: [&#39;X&#39;, np.nan,&#39;Z&#39;],
                   &#39;B&#39;: [&#39;X&#39;, &#39;Y&#39;, np.nan],
                   &#39;C&#39;: [np.nan, &#39;Y&#39;, &#39;Z&#39;]})

<img src="https://i.stack.imgur.com/QpBfA.png" width="200" height="100"/>

Doing it with looping over column and row indices would probably take very long on large dataframes, so I would prefer a solution using the pandas functions.

答案1

得分: 1

你可以使用一个掩码和布尔掩码的乘法来处理字符串:

m = df.notna()

out = m.mul(df['id'], axis=0).where(m)

或者使用 [tag:numpy]:

import numpy as np

m = df.notna()
out = pd.DataFrame(np.where(m, np.repeat(df['id'].to_numpy()[:, None],
                                         df.shape[1], axis=1),
                            df),
                   index=df.index, columns=df.columns)

另一个方法是使用重新索引:

out = df[['id']].reindex(columns=df.columns).ffill(axis=1).where(df.notna())

输出结果如下:

  id    A    B    C
0  X    X    X  NaN
1  Y  NaN    Y    Y
2  Z    Z  NaN    Z
英文:

You can use a mask and multiplication of the boolean mask as string:

m = df.notna()

out = m.mul(df[&#39;id&#39;], axis=0).where(m)

Or with [tag:numpy]:

import numpy as np

m = df.notna()
out = pd.DataFrame(np.where(m, np.repeat(df[&#39;id&#39;].to_numpy()[:,None],
                                         df.shape[1], axis=1),
                            df),
                   index=df.index, columns=df.columns)

Another idea with reindexing:

out = df[[&#39;id&#39;]].reindex(columns=df.columns).ffill(axis=1).where(df.notna())

Output:

  id    A    B    C
0  X    X    X  NaN
1  Y  NaN    Y    Y
2  Z    Z  NaN    Z

huangapple
  • 本文由 发表于 2023年2月24日 00:08:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75547379.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定