pandas groupBy函数的时间复杂度是什么?

huangapple go评论83阅读模式
英文:

What is the time complexity of pandas groupBy function?

问题

我尝试找到它,但无法在任何地方找到它。我阅读的关于 group by 的文章说 groupBy 通过分割和分箱项目来工作,但我无法确信地猜测出时间复杂度。

链接:https://www.geeksforgeeks.org/pandas-groupby/

我还查了一下 groupBy 的实现,但很遗憾我无法理解它。

英文:

I tried finding it, but couldn't find it anywhere. This article I read about group by says that groupBy works by splitting and binning the items, but I couldn't convincingly guess the time complexity.

https://www.geeksforgeeks.org/pandas-groupby/

I also looked up groupBy's implementation, but I couldn't make sense of it sadly.

答案1

得分: 2

分割组是O(n),其中n是行数。

由于groupby默认按组排序,假设k是唯一组的数量,复杂度为O(n + k*log(k)),这就是为什么文档建议“通过关闭此功能获得更好的性能”。

英文:

Splitting the groups is O(n) with n the number of rows.

Since groupby sorts the groups by default, assuming k the number of unique groups, the complexity is O(n + k*log(k)), which is why the documentation recommends "Get better performance by turning this off".

huangapple
  • 本文由 发表于 2023年8月10日 10:06:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76872228.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定