英文:
how make kmeans on specific columns?
问题
我想对我的数据集中的特定列执行K均值。由于这些是分类数据,我计划对其进行独热编码。现在我想知道是否可以对特定列执行K均值,并显示结果(例如一个组的结果)以及所有列?
例如,我有col1、col2和col3
,对col2和col3
执行K均值,这些列已进行了独热编码,并显示包括col1、col2和col3
在内的结果。
我希望我已经清楚地表达了我的问题。
英文:
I would like to do a K-means on specific columns of my data set.
As these are categorical data, I plan to do a onehot_encoding on it. Now I would like to know if it is possible to do K-means on specific columns and display the results (of a group for example) with all the columns?
For example i have col1, col2 and col3
, K-means on col2 and col3
which are onehot_encoded and display results with col1, col2 and col3
.
I hope I have clearly expressed my concern
答案1
得分: 4
这是kmeans的基本文档遵循的部分:
from sklearn.cluster import KMeans
#在这里选择你的列
X = df[['col1', 'col2', 'col3']]
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
#这将为您提供分组
kmeans.predict(X)
因此,kmeans预测命令将为您提供分组,您可以将其添加到原始数据中。
英文:
This follows the basic documentation of kmeans:
from sklearn.cluster import KMeans
#here you select your columns
X = df[['col1', 'col2', 'col3']]
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
#this will give you the groups back
kmeans.predict(X)
So the kmeans predict command will give you the group back which you can add to your original data.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论