英文:
Polars memory usage as compared to {data.table}
问题
如何与R的data.table
包在内存使用方面进行比较?
它如何处理浅复制?
是否支持/默认支持原地/按引用更新?
最近有关内存效率的四个主要内存数据处理库(polars vs data.table vs pandas vs dplyr)的性能基准吗?
英文:
Fairly new to python-polars
.
How does it compare to Rs {data.table}
package in terms of memory usage?
How does it handle shallow copying?
Is in-place/by reference updating possible/the default?
Are there any recent benchmarks on memory efficiency of the big 4 in-mem data wrangling libs (polars vs data.table vs pandas vs dplyr)?
答案1
得分: 3
How does it handle shallow copying?
Polars内存缓冲区是引用计数的写时复制。这意味着您永远不能在Polars内进行完整的数据复制。
Is in-place/by reference updating possible/the default?
不,您必须重新分配变量。在底层,Polars可能会重用内存缓冲区,但对用户来说是不可见的。
Are there any recent benchmarks on memory efficiency?
关于内存使用情况的问题也没有考虑到设计差异。Polars目前正在开发一款离线引擎。这个引擎不会在内存中处理所有数据,而是会从磁盘流式传输数据。该引擎的设计理念是根据需要使用尽可能多的内存,而不会导致OOM(内存耗尽)。未使用的内存是浪费的潜力。
英文:
> How does it handle shallow copying?
Polars memory buffers are reference counted Copy on Write. That means you can never do a full data copy within polars.
> Is in-place/by reference updating possible/the default?
No, you must reassign the variable. Under the hood polars' may reuse memory buffers, but that is not visible to the users.
> Are there any recent benchmarks on memory efficiency
The question how it relates in memory usage is also not doing respect to design differences. Polars currently is developing an out-of-core engine. This engine doesn't process all data in memory, but will stream data from disk. The design philosophy of that engine is to use as much memory as needed without going OOM. Unused memory, is wasted potential.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论