根据另一列中的数值替换缺失数值。

huangapple go评论68阅读模式
英文:

Replace missing values based on values in another column

问题

我有以下问题

我需要替换数据框中的NaN值

df1 = pd.DataFrame([[1001, np.NaN], [1001,'C'], [1004, 'D'],[1005, 'C'], 
                   [1005,'D'], [1010, np.NaN],[1010,np.NaN],[1010,'F']], columns=['CustomerNr','Costs'])
CustomerNr Costs
1001 NaN
1004 D
1005 C
1010 NaN
1010 NaN

我尝试过:

df2 = pd.DataFrame([[1001, 'X'], [1010, 'Y']], columns=['CustomerNr','New Costs'])

期望输出:

CustomerNr Costs
1001 X
1004 D
1005 C
1010 Y
1010 Y
英文:

I have the following problem

I need to replace NaN values in dataframe

df1 = pd.DataFrame([[1001, np.NaN], [1001,'C'], [1004, 'D'],[1005, 'C'], 
                   [1005,'D'], [1010, np.NaN],[1010,np.NaN],[1010,'F']], columns=['CustomerNr','Costs'])
CustomerNr Costs
1001 NaN
1004 D
1005 C
1010 NaN
1010 NaN

I've tried:

df2 = pd.DataFrame([[1001, 'X'], [1010, 'Y']], columns=['CustomerNr','New Costs'])

Desired output:

CustomerNr Costs
1001 X
1004 D
1005 C
1010 Y
1010 Y

答案1

得分: 1

使用系列映射(基于匹配的'CustomerNr'值)填充NA/NaN值:

df1['Costs'].fillna(df1['CustomerNr']
                   .map(df2.set_index('CustomerNr')['New Costs']), inplace=True)

   CustomerNr Costs
0        1001     X
1        1001     C
2        1004     D
3        1005     C
4        1005     D
5        1010     Y
6        1010     Y
7        1010     F

<details>
<summary>英文:</summary>

[Fill][1] `NA/NaN` values based on series mapping (on matched `&#39;CustomerNr&#39;` values):


    df1[&#39;Costs&#39;].fillna(df1[&#39;CustomerNr&#39;]
                        .map(df2.set_index(&#39;CustomerNr&#39;)[&#39;New Costs&#39;]), inplace=True)


----------

       CustomerNr Costs
    0        1001     X
    1        1001     C
    2        1004     D
    3        1005     C
    4        1005     D
    5        1010     Y
    6        1010     Y
    7        1010     F


  [1]: https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html


</details>



# 答案2
**得分**: 0

我认为你可以使用类似这样的代码:

```python
import pandas as pd
import numpy as np

df1 = pd.DataFrame([[1001, np.NaN], [1001,'C'], [1004, 'D'], [1005, 'C'], 
                   [1005,'D'], [1010, np.NaN],[1010,np.NaN],[1010,'F']], columns=['CustomerNr','Costs'])

replace_dict = {1001: "X", 1010: "Y"}

df1['Costs'] = df1.apply(lambda x: replace_dict.get(x['CustomerNr']) if pd.isna(x['Costs']) else x['Costs'], axis=1)

解释:创建一个字典(replace_dict),根据CustomerNr列的值来映射要分配的值,然后使用apply()将这些值分配给Costs列,如果CustomerNr列的值是nan,则应用Costs列的原始值。

英文:

I think you could use something like this

import pandas as pd
import numpy as np

df1 = pd.DataFrame([[1001, np.NaN], [1001,&#39;C&#39;], [1004, &#39;D&#39;],[1005, &#39;C&#39;], 
                   [1005,&#39;D&#39;], [1010, np.NaN],[1010,np.NaN],[1010,&#39;F&#39;]], columns=[&#39;CustomerNr&#39;,&#39;Costs&#39;])

replace_dict = {1001:&quot;X&quot;,1010:&quot;Y&quot;}

df1[&#39;Costs&#39;] = df1.apply(lambda x: replace_dict.get(x[&#39;CustomerNr&#39;]) if pd.isna(x[&#39;Costs&#39;]) else x[&#39;Costs&#39;], axis=1)

Explanation: creates a dictionary (replace_dict) that maps the values to assign based on the value of CustomerNr column and use apply.() to assign those values if the value in CustomerNr is nan, else apply the original value of Costs

huangapple
  • 本文由 发表于 2023年2月26日 22:09:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/75572545.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定