英文:
Replace missing values based on values in another column
问题
我有以下问题
我需要替换数据框中的NaN值
df1 = pd.DataFrame([[1001, np.NaN], [1001,'C'], [1004, 'D'],[1005, 'C'],
[1005,'D'], [1010, np.NaN],[1010,np.NaN],[1010,'F']], columns=['CustomerNr','Costs'])
CustomerNr | Costs |
---|---|
1001 | NaN |
1004 | D |
1005 | C |
1010 | NaN |
1010 | NaN |
我尝试过:
df2 = pd.DataFrame([[1001, 'X'], [1010, 'Y']], columns=['CustomerNr','New Costs'])
期望输出:
CustomerNr | Costs |
---|---|
1001 | X |
1004 | D |
1005 | C |
1010 | Y |
1010 | Y |
英文:
I have the following problem
I need to replace NaN values in dataframe
df1 = pd.DataFrame([[1001, np.NaN], [1001,'C'], [1004, 'D'],[1005, 'C'],
[1005,'D'], [1010, np.NaN],[1010,np.NaN],[1010,'F']], columns=['CustomerNr','Costs'])
CustomerNr | Costs |
---|---|
1001 | NaN |
1004 | D |
1005 | C |
1010 | NaN |
1010 | NaN |
I've tried:
df2 = pd.DataFrame([[1001, 'X'], [1010, 'Y']], columns=['CustomerNr','New Costs'])
Desired output:
CustomerNr | Costs |
---|---|
1001 | X |
1004 | D |
1005 | C |
1010 | Y |
1010 | Y |
答案1
得分: 1
使用系列映射(基于匹配的'CustomerNr'值)填充NA/NaN
值:
df1['Costs'].fillna(df1['CustomerNr']
.map(df2.set_index('CustomerNr')['New Costs']), inplace=True)
CustomerNr Costs
0 1001 X
1 1001 C
2 1004 D
3 1005 C
4 1005 D
5 1010 Y
6 1010 Y
7 1010 F
<details>
<summary>英文:</summary>
[Fill][1] `NA/NaN` values based on series mapping (on matched `'CustomerNr'` values):
df1['Costs'].fillna(df1['CustomerNr']
.map(df2.set_index('CustomerNr')['New Costs']), inplace=True)
----------
CustomerNr Costs
0 1001 X
1 1001 C
2 1004 D
3 1005 C
4 1005 D
5 1010 Y
6 1010 Y
7 1010 F
[1]: https://pandas.pydata.org/docs/reference/api/pandas.Series.fillna.html
</details>
# 答案2
**得分**: 0
我认为你可以使用类似这样的代码:
```python
import pandas as pd
import numpy as np
df1 = pd.DataFrame([[1001, np.NaN], [1001,'C'], [1004, 'D'], [1005, 'C'],
[1005,'D'], [1010, np.NaN],[1010,np.NaN],[1010,'F']], columns=['CustomerNr','Costs'])
replace_dict = {1001: "X", 1010: "Y"}
df1['Costs'] = df1.apply(lambda x: replace_dict.get(x['CustomerNr']) if pd.isna(x['Costs']) else x['Costs'], axis=1)
解释:创建一个字典(replace_dict
),根据CustomerNr
列的值来映射要分配的值,然后使用apply()
将这些值分配给Costs
列,如果CustomerNr
列的值是nan
,则应用Costs
列的原始值。
英文:
I think you could use something like this
import pandas as pd
import numpy as np
df1 = pd.DataFrame([[1001, np.NaN], [1001,'C'], [1004, 'D'],[1005, 'C'],
[1005,'D'], [1010, np.NaN],[1010,np.NaN],[1010,'F']], columns=['CustomerNr','Costs'])
replace_dict = {1001:"X",1010:"Y"}
df1['Costs'] = df1.apply(lambda x: replace_dict.get(x['CustomerNr']) if pd.isna(x['Costs']) else x['Costs'], axis=1)
Explanation: creates a dictionary (replace_dict
) that maps the values to assign based on the value of CustomerNr
column and use apply.()
to assign those values if the value in CustomerNr
is nan
, else apply the original value of Costs
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论