英文:
Replace missing values based on values in another column
问题
我有以下问题
我需要替换数据框中的NaN值
df1 = pd.DataFrame([[1001, np.NaN], [1001,'C'], [1004, 'D'],[1005, 'C'], 
                   [1005,'D'], [1010, np.NaN],[1010,np.NaN],[1010,'F']], columns=['CustomerNr','Costs'])
| CustomerNr | Costs | 
|---|---|
| 1001 | NaN | 
| 1004 | D | 
| 1005 | C | 
| 1010 | NaN | 
| 1010 | NaN | 
我尝试过:
df2 = pd.DataFrame([[1001, 'X'], [1010, 'Y']], columns=['CustomerNr','New Costs'])
期望输出:
| CustomerNr | Costs | 
|---|---|
| 1001 | X | 
| 1004 | D | 
| 1005 | C | 
| 1010 | Y | 
| 1010 | Y | 
英文:
I have the following problem
I need to replace NaN values in dataframe
df1 = pd.DataFrame([[1001, np.NaN], [1001,'C'], [1004, 'D'],[1005, 'C'], 
                   [1005,'D'], [1010, np.NaN],[1010,np.NaN],[1010,'F']], columns=['CustomerNr','Costs'])
| CustomerNr | Costs | 
|---|---|
| 1001 | NaN | 
| 1004 | D | 
| 1005 | C | 
| 1010 | NaN | 
| 1010 | NaN | 
I've tried:
df2 = pd.DataFrame([[1001, 'X'], [1010, 'Y']], columns=['CustomerNr','New Costs'])
Desired output:
| CustomerNr | Costs | 
|---|---|
| 1001 | X | 
| 1004 | D | 
| 1005 | C | 
| 1010 | Y | 
| 1010 | Y | 
答案1
得分: 1
使用系列映射(基于匹配的'CustomerNr'值)填充NA/NaN值:
df1['Costs'].fillna(df1['CustomerNr']
                   .map(df2.set_index('CustomerNr')['New Costs']), inplace=True)
   CustomerNr Costs
0        1001     X
1        1001     C
2        1004     D
3        1005     C
4        1005     D
5        1010     Y
6        1010     Y
7        1010     F
<details>
<summary>英文:</summary>
[Fill][1] `NA/NaN` values based on series mapping (on matched `'CustomerNr'` values):
    df1['Costs'].fillna(df1['CustomerNr']
                        .map(df2.set_index('CustomerNr')['New Costs']), inplace=True)
----------
       CustomerNr Costs
    0        1001     X
    1        1001     C
    2        1004     D
    3        1005     C
    4        1005     D
    5        1010     Y
    6        1010     Y
    7        1010     F
  [1]: https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html
</details>
# 答案2
**得分**: 0
我认为你可以使用类似这样的代码:
```python
import pandas as pd
import numpy as np
df1 = pd.DataFrame([[1001, np.NaN], [1001,'C'], [1004, 'D'], [1005, 'C'], 
                   [1005,'D'], [1010, np.NaN],[1010,np.NaN],[1010,'F']], columns=['CustomerNr','Costs'])
replace_dict = {1001: "X", 1010: "Y"}
df1['Costs'] = df1.apply(lambda x: replace_dict.get(x['CustomerNr']) if pd.isna(x['Costs']) else x['Costs'], axis=1)
解释:创建一个字典(replace_dict),根据CustomerNr列的值来映射要分配的值,然后使用apply()将这些值分配给Costs列,如果CustomerNr列的值是nan,则应用Costs列的原始值。
英文:
I think you could use something like this
import pandas as pd
import numpy as np
df1 = pd.DataFrame([[1001, np.NaN], [1001,'C'], [1004, 'D'],[1005, 'C'], 
                   [1005,'D'], [1010, np.NaN],[1010,np.NaN],[1010,'F']], columns=['CustomerNr','Costs'])
replace_dict = {1001:"X",1010:"Y"}
df1['Costs'] = df1.apply(lambda x: replace_dict.get(x['CustomerNr']) if pd.isna(x['Costs']) else x['Costs'], axis=1)
Explanation: creates a dictionary (replace_dict) that maps the values to assign based on the value of CustomerNr column and use apply.() to assign those values if the value in CustomerNr is nan, else apply the original value of Costs
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论