sys.getsizeof在pandas系列或数据帧持有某种类型时为什么会失败

huangapple go评论140阅读模式
英文:

Why does sys.getsizeof fail on a pandas series or data frame when they hold a type

问题

Python中的getsizeof在包含类型数据的系列上失败,我有一个函数,需要计算任何给定参数的大小,我使用getsizeof来执行此操作。但这是一个问题,因为getsizeof对于这些类型的数据框来说会出现意外失败。有没有办法避免在getSizeof中出现这种故障。

import sys
import df = pd.Series(str)

sys.getsizeof(df)
TypeError: 'str'对象的描述符'__sizeof__'需要一个参数
英文:

Python getsizeof fails on a series which holds a type as data , I have function in which I need to calculate the size of any given argument which I do with getsizeof. But this is an issue as getsizeof fails unexpectedly for these kind of dataframes. Is there a way to avoid this failure in getSizeof

import sys
df=pd.Series(str)

sys.getsizeof(df)
TypeError: descriptor '__sizeof__' of 'str' object needs an argument

答案1

得分: 3

这是一个Pandas的错误。

Pandas做出了不同寻常的决定,试图计算一个“深度”的大小,包括所有元素的大小,而不仅仅是与Series实例本身直接相关的内存消耗。对于Series实例的__sizeof__实现最终会进入一个循环,尝试调用元素的__sizeof__

for i in range(n):
    size += arr[i].__sizeof__()
return size

但像这样调用__sizeof__是不正确的。实际上应该调用sys.getsizeof(arr[i])

像这样调用__sizeof__之所以不正确,有两个原因:首先,正如你已经看到的,当元素是一个类型对象时,它会失败,因为str.__sizeof__是计算字符串大小的未绑定方法,而不是计算str类型对象本身大小的方法。其次,sys.getsizeof会为__sizeof__未考虑的GC元数据添加修正。

英文:

This is a Pandas bug.

Pandas makes the unusual decision of trying to compute a "deep" sizeof, including all element sizes, rather than just the memory consumption directly attributable to the Series instance itself. The __sizeof__ implementation for a Series instance eventually hits a loop that tries to call __sizeof__ on the elements:

for i in range(n):
    size += arr[i].__sizeof__()
return size

but calling __sizeof__ like this is incorrect. It should really call sys.getsizeof(arr[i]).

Calling __sizeof__ like this is wrong for two reasons: first, as you've seen, it fails when an element is a type object, because str.__sizeof__ is the unbound method for computing the size of a string, not the method for computing the size of the str type object itself. Second, sys.getsizeof adds corrections for GC metadata that __sizeof__ doesn't account for.

答案2

得分: 2

我在https://github.com/pandas-dev/pandas/issues/51858 解决了这个问题。
升级 pandas,然后您应该能够无问题地运行相同的代码:

import sys
df = pd.Series(str)
sys.getsizeof(df)
英文:

I solved it at https://github.com/pandas-dev/pandas/issues/51858.
Upgrade pandas and you should be able to run the same code with no problem:

import sys
df=pd.Series(str)
sys.getsizeof(df)

huangapple
  • 本文由 发表于 2023年3月9日 17:58:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/75682987.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定