英文:
Replace missing values with the value of the column with the minimum sum of differences
问题
我有以下数据框。
# 创建一个示例数据框
df = pd.DataFrame({'Age': [np.nan, 31, 29, 43, np.nan],
'Weight': [np.nan, 100, 60, 75, np.nan],
'Height': [1.65, 1.64, 1.75, 1.70, 1.68],
'BMI': [19, 15, 10, 25, 30]})
我想要替换缺失值的列如下:
case_columns = ['Age', 'Weight']
我想要一个Python算法,它将用具有与缺失值所在行的差值之和最小的行的相同值来替换缺失值。
在我的示例中,在第0行,年龄应为31,体重为100,因为与第1行的差值之和最小((1.65-1.64) + (19-15))。在第4行,年龄应为43,体重应为75。
我该如何在Python中实现这个功能?
英文:
I have the dataframe below.
# Create a sample DataFrame
df = pd.DataFrame({'Age': [np.nan, 31, 29, 43, np.nan],
'Weight': [np.nan, 100, 60, 75, np.nan],
'Height': [1.65, 1.64, 1.75, 1.70, 1.68],
'BMI': [19, 15, 10, 25, 30]})
and the columns I want to replace missing values for:
case_columns = ['Age', 'Weight']
I want an algorithm -in python- which will replace the missing values with the same value of the row with: the minimum sum of the difference between the row of the missing value and the others.
In my example, in row 0, the age should be 31 and the weight 100, having the min difference ((1.65-164) + (19-15)) with row 1. In row 4 the age should be 43 and the weight 75.
How can I do this in Python?
答案1
得分: 1
你可以尝试创建一个函数并使用df.apply()
来处理缺失值。
def fill_missing(x):
# 如果年龄或体重缺失
if any(np.isnan(x.drop('Height')):
# 创建一个包含身高差异的Series(不包括当前行)
height_diff = np.abs(df.drop(x.name)['Height'] - x['Height'])
# 获取最小差异的行索引(注意使用绝对值)
row_idx = height_diff.idxmin()
# 替代缺失值
for feature in x.index:
if np.isnan(x[feature]):
x[feature] = df.loc[row_idx][feature]
return x
df.apply(fill_missing, axis=1)
# 如果你想改变df的值
df = df.apply(fill_missing, axis=1)
请注意,这是一段Python代码,用于处理数据框df
中的缺失值。
英文:
You can try creating a function and using df.apply()
def fill_missing(x):
# if age or weight are missing
if any(np.isnan(x.drop('Height'))):
# create series df height - row height (exlude current row)
height_diff = np.abs(df.drop(x.name)['Height'] - x['Height'])
# get row index of minimum (obs: remember to use abs)
row_idx = height_diff.idxmin()
# substitute whatever is missing
for feature in x.index:
if np.isnan(x[feature]):
x[feature] = df.loc[row_idx][feature]
return x
df.apply(fill_missing, axis=1)
# if you want to change the value of df
df = df.apply(fill_missing, axis=1)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论