如何在特定列上执行K均值聚类?

huangapple go评论85阅读模式
英文:

how make kmeans on specific columns?

问题

我想对我的数据集中的特定列执行K均值。由于这些是分类数据,我计划对其进行独热编码。现在我想知道是否可以对特定列执行K均值,并显示结果(例如一个组的结果)以及所有列?

例如,我有col1、col2和col3,对col2和col3执行K均值,这些列已进行了独热编码,并显示包括col1、col2和col3在内的结果。
我希望我已经清楚地表达了我的问题。

英文:

I would like to do a K-means on specific columns of my data set.
As these are categorical data, I plan to do a onehot_encoding on it. Now I would like to know if it is possible to do K-means on specific columns and display the results (of a group for example) with all the columns?

For example i have col1, col2 and col3, K-means on col2 and col3which are onehot_encoded and display results with col1, col2 and col3.
I hope I have clearly expressed my concern

答案1

得分: 4

这是kmeans的基本文档遵循的部分:

from sklearn.cluster import KMeans
#在这里选择你的列
X = df[['col1', 'col2', 'col3']]
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
#这将为您提供分组
kmeans.predict(X)

因此,kmeans预测命令将为您提供分组,您可以将其添加到原始数据中。

英文:

This follows the basic documentation of kmeans:

from sklearn.cluster import KMeans
#here you select your columns
X = df[['col1', 'col2', 'col3']]
kmeans = KMeans(n_clusters=2, random_state=0).fit(X)
#this will give you the groups back
kmeans.predict(X)

So the kmeans predict command will give you the group back which you can add to your original data.

huangapple
  • 本文由 发表于 2020年1月6日 23:03:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/59614387.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定