英文:
can sklearn's KNN Imputer work with specific rows within a dataframe?
问题
我有一个带有一些NaN值的pandas数据帧,我想使用KNN插补器来填充它们。我希望插补器根据特定参数选择“邻居”,在这种情况下,它应该只根据具有相同“patient_id”的值进行插补。缺失值是一些医学分析结果。
我尝试通过创建一个唯一的“patient_id”列表来解决这个问题,使用:
patient_list=data['patient_id'].unique()
然后我通过“patient_id”掩码遍历列表,然后将所有子数据帧合并在一起,使用:
from sklearn.impute import KNNImputer
knn = KNNImputer(missing_values=np.nan)
data_imputed = pd.DataFrame()
for patient_id in patient_list:
X = knn.fit_transform(data[data['patient_id']==patient_id])
X_ = pd.DataFrame(X, columns = data.columns)
data_imputed.merge(X_, on=['patient_id','visit_month','visit_id'], how='left', copy=False)
但是它给我一个ValueError错误:
ValueError: Shape of passed values is (4, 1187), indices imply (4, 1198)
我的原始数据帧有1198列,所以为什么会丢失11列?谢谢你的帮助!
英文:
I have a pandas dataframe with some NaN values and I am trying to use the KNN imputer to fill them. I want the imputer to pick 'neighbors' based on a specific parameter, in this case it should only impute based on values with the same "patient_id". The missing values are some medical analysis results.
I tried to solve this problem by creating a list of unique "patient_id", using:
patient_list=data['patient_id'].unique()
then I iterated through the list with 'patient_id' masking, then merging all the sub-dataframes together, with:
from sklearn.impute import KNNImputer
knn = KNNImputer(missing_values=np.nan)
data_imputed = pd.DataFrame()
for patient_id in patient_list:
X = knn.fit_transform(data[data['patient_id']==patient_id])
X_ = pd.DataFrame(X, columns = data.columns)
data_imputed.merge(X_, on=['patient_id','visit_month','visit_id'], how='left', copy=False)
but it is giving me a ValueError:
ValueError: Shape of passed values is (4, 1187), indices imply (4, 1198)
My original dataframe has 1198 columns, so how did 11 columns go missing? Thank you for helping!
答案1
得分: 0
from sklearn.impute import KNNImputer
knn = KNNImputer(missing_values=np.nan)
data_imputed = []
for patient_id in patient_list:
X = knn.fit_transform(data[data['patient_id']==patient_id])
X_ = pd.DataFrame(X, columns = data.columns)
data_imputed.append(X_, on=['patient_id','visit_month','visit_id'], how='left', copy=False)
data_imputed = pd.concat(data_imputed)
from sklearn.impute import KNNImputer
knn = KNNImputer(missing_values=np.nan)
data_imputed = []
for patient_id in patient_list:
X = knn.fit_transform(data[data['patient_id']==patient_id])
X_ = pd.DataFrame(X, columns = data.columns)
data_imputed.append(X_, on=['patient_id','visit_month','visit_id'], how='left', copy=False)
data_imputed = pd.concat(data_imputed)
英文:
from sklearn.impute import KNNImputer
knn = KNNImputer(missing_values=np.nan)
data_imputed = []
for patient_id in patient_list:
X = knn.fit_transform(data[data['patient_id']==patient_id])
X_ = pd.DataFrame(X, columns = data.columns)
data_imputed.append(X_, on=['patient_id','visit_month','visit_id'], how='left', copy=False)
data_imputed = pd.concat(data_imputed)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论