sklearn的KNN Imputer能够处理数据框中的特定行吗?

huangapple go评论78阅读模式
英文:

can sklearn's KNN Imputer work with specific rows within a dataframe?

问题

我有一个带有一些NaN值的pandas数据帧,我想使用KNN插补器来填充它们。我希望插补器根据特定参数选择“邻居”,在这种情况下,它应该只根据具有相同“patient_id”的值进行插补。缺失值是一些医学分析结果。

我尝试通过创建一个唯一的“patient_id”列表来解决这个问题,使用:

patient_list=data['patient_id'].unique()

然后我通过“patient_id”掩码遍历列表,然后将所有子数据帧合并在一起,使用:

from sklearn.impute import KNNImputer
knn = KNNImputer(missing_values=np.nan)

data_imputed = pd.DataFrame()

for patient_id in patient_list:
    X = knn.fit_transform(data[data['patient_id']==patient_id])
    X_ = pd.DataFrame(X, columns = data.columns)
    data_imputed.merge(X_, on=['patient_id','visit_month','visit_id'], how='left', copy=False)

但是它给我一个ValueError错误:

ValueError: Shape of passed values is (4, 1187), indices imply (4, 1198)

我的原始数据帧有1198列,所以为什么会丢失11列?谢谢你的帮助!

英文:

I have a pandas dataframe with some NaN values and I am trying to use the KNN imputer to fill them. I want the imputer to pick 'neighbors' based on a specific parameter, in this case it should only impute based on values with the same "patient_id". The missing values are some medical analysis results.

I tried to solve this problem by creating a list of unique "patient_id", using:

patient_list=data['patient_id'].unique()

then I iterated through the list with 'patient_id' masking, then merging all the sub-dataframes together, with:

from sklearn.impute import KNNImputer
knn = KNNImputer(missing_values=np.nan)

data_imputed = pd.DataFrame()

for patient_id in patient_list:
    X = knn.fit_transform(data[data['patient_id']==patient_id])
    X_ = pd.DataFrame(X, columns = data.columns)
    data_imputed.merge(X_, on=['patient_id','visit_month','visit_id'], how='left', copy=False)

but it is giving me a ValueError:

ValueError: Shape of passed values is (4, 1187), indices imply (4, 1198)

My original dataframe has 1198 columns, so how did 11 columns go missing? Thank you for helping!

答案1

得分: 0

from sklearn.impute import KNNImputer
knn = KNNImputer(missing_values=np.nan)

data_imputed = []

for patient_id in patient_list:
    X = knn.fit_transform(data[data['patient_id']==patient_id])
    X_ = pd.DataFrame(X, columns = data.columns)
    data_imputed.append(X_, on=['patient_id','visit_month','visit_id'], how='left', copy=False)

data_imputed = pd.concat(data_imputed)
from sklearn.impute import KNNImputer
knn = KNNImputer(missing_values=np.nan)

data_imputed = []

for patient_id in patient_list:
    X = knn.fit_transform(data[data['patient_id']==patient_id])
    X_ = pd.DataFrame(X, columns = data.columns)
    data_imputed.append(X_, on=['patient_id','visit_month','visit_id'], how='left', copy=False)

data_imputed = pd.concat(data_imputed)
英文:
from sklearn.impute import KNNImputer
knn = KNNImputer(missing_values=np.nan)

data_imputed = []

for patient_id in patient_list:
    X = knn.fit_transform(data[data['patient_id']==patient_id])
    X_ = pd.DataFrame(X, columns = data.columns)
    data_imputed.append(X_, on=['patient_id','visit_month','visit_id'], how='left', copy=False)

data_imputed = pd.concat(data_imputed)

huangapple
  • 本文由 发表于 2023年8月9日 07:56:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76863785.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定