英文:
Is there any Python Library that will generates exact output for the R KMeans?
问题
我需要在我的数据上执行K均值聚类。我已经在R和Python中都实现了K均值算法,具体使用了sklearn和SciPy库。然而,我在两种语言之间的聚类结果中遇到了差异,Python似乎生成了一个在R中不存在的离群值。
我已经确保在R和Python中使用了相同的输入数据和参数(例如,簇的数量、初始化方法)。尽管如此,我无法获得与R生成的相同的簇中心。我还尝试在Python中使用了K均值++初始化方法,但问题仍然存在。
我将非常感谢任何关于如何解决这种差异并在R和Python之间获得一致的聚类结果的见解或建议。我是否可能遗漏了任何特定的考虑或参数设置?
英文:
I need to perform K-means clustering on my data. I have implemented the K-means algorithm in both R and Python, specifically using the libraries sklearn and SciPy. However, I am encountering a discrepancy in the clustering results between the two languages, where Python seems to generate an outlier that is not present in R.
I have ensured that I am using the same input data and parameters (e.g., number of clusters, initialization method) in both R and Python. Despite this, I am unable to obtain identical cluster centers as generated by R. I have also attempted using the K-means++ initialization method in Python, but the issue persists.
I would greatly appreciate any insights or suggestions on how to resolve this discrepancy and achieve consistent clustering results between R and Python. Is there any specific consideration or parameter setting that I might be missing?
答案1
得分: 1
我不是一个机器学习专家,所以也许你不应该太过信任我的话,但是,你考虑过K均值可能不是一种确定性方法,导致结果不同吗?
如果我是你,我会尝试在两者中都使用更低的容差和更多的迭代次数来运行算法,并检查结果是否继续不同。
英文:
I am not a machine learning expert so maybe you should not trust my word very much but, have you considered that maybe the differing results are because kmeans is not a deterministic approach?
If I were you, I would try to run the algorithm using a lower tolerance and a higher number of iterations in both, and check if the results continue differing.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论