Polars相对于{data.table}的内存使用情况

huangapple go评论75阅读模式
英文:

Polars memory usage as compared to {data.table}

问题

如何与R的data.table包在内存使用方面进行比较?

它如何处理浅复制?

是否支持/默认支持原地/按引用更新?

最近有关内存效率的四个主要内存数据处理库(polars vs data.table vs pandas vs dplyr)的性能基准吗?

英文:

Fairly new to python-polars.

How does it compare to Rs {data.table} package in terms of memory usage?

How does it handle shallow copying?

Is in-place/by reference updating possible/the default?

Are there any recent benchmarks on memory efficiency of the big 4 in-mem data wrangling libs (polars vs data.table vs pandas vs dplyr)?

答案1

得分: 3

How does it handle shallow copying?

Polars内存缓冲区是引用计数的写时复制。这意味着您永远不能在Polars内进行完整的数据复制。

Is in-place/by reference updating possible/the default?

不,您必须重新分配变量。在底层,Polars可能会重用内存缓冲区,但对用户来说是不可见的。

Are there any recent benchmarks on memory efficiency?

关于内存使用情况的问题也没有考虑到设计差异。Polars目前正在开发一款离线引擎。这个引擎不会在内存中处理所有数据,而是会从磁盘流式传输数据。该引擎的设计理念是根据需要使用尽可能多的内存,而不会导致OOM(内存耗尽)。未使用的内存是浪费的潜力。

英文:

> How does it handle shallow copying?

Polars memory buffers are reference counted Copy on Write. That means you can never do a full data copy within polars.

> Is in-place/by reference updating possible/the default?

No, you must reassign the variable. Under the hood polars' may reuse memory buffers, but that is not visible to the users.

> Are there any recent benchmarks on memory efficiency

The question how it relates in memory usage is also not doing respect to design differences. Polars currently is developing an out-of-core engine. This engine doesn't process all data in memory, but will stream data from disk. The design philosophy of that engine is to use as much memory as needed without going OOM. Unused memory, is wasted potential.

huangapple
  • 本文由 发表于 2023年6月4日 22:51:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76400975.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定