英文:
Why does sys.getsizeof fail on a pandas series or data frame when they hold a type
问题
Python中的getsizeof在包含类型数据的系列上失败,我有一个函数,需要计算任何给定参数的大小,我使用getsizeof来执行此操作。但这是一个问题,因为getsizeof对于这些类型的数据框来说会出现意外失败。有没有办法避免在getSizeof中出现这种故障。
import sys
import df = pd.Series(str)
sys.getsizeof(df)
TypeError: 'str'对象的描述符'__sizeof__'需要一个参数
英文:
Python getsizeof fails on a series which holds a type as data , I have function in which I need to calculate the size of any given argument which I do with getsizeof. But this is an issue as  getsizeof fails unexpectedly for these kind of dataframes. Is there a way to avoid this failure in getSizeof
import sys
df=pd.Series(str)
sys.getsizeof(df)
TypeError: descriptor '__sizeof__' of 'str' object needs an argument
答案1
得分: 3
这是一个Pandas的错误。
Pandas做出了不同寻常的决定,试图计算一个“深度”的大小,包括所有元素的大小,而不仅仅是与Series实例本身直接相关的内存消耗。对于Series实例的__sizeof__实现最终会进入一个循环,尝试调用元素的__sizeof__:
for i in range(n):
    size += arr[i].__sizeof__()
return size
但像这样调用__sizeof__是不正确的。实际上应该调用sys.getsizeof(arr[i])。
像这样调用__sizeof__之所以不正确,有两个原因:首先,正如你已经看到的,当元素是一个类型对象时,它会失败,因为str.__sizeof__是计算字符串大小的未绑定方法,而不是计算str类型对象本身大小的方法。其次,sys.getsizeof会为__sizeof__未考虑的GC元数据添加修正。
英文:
This is a Pandas bug.
Pandas makes the unusual decision of trying to compute a "deep" sizeof, including all element sizes, rather than just the memory consumption directly attributable to the Series instance itself. The __sizeof__ implementation for a Series instance eventually hits a loop that tries to call __sizeof__ on the elements:
for i in range(n):
    size += arr[i].__sizeof__()
return size
but calling __sizeof__ like this is incorrect. It should really call sys.getsizeof(arr[i]).
Calling __sizeof__ like this is wrong for two reasons: first, as you've seen, it fails when an element is a type object, because str.__sizeof__ is the unbound method for computing the size of a string, not the method for computing the size of the str type object itself. Second, sys.getsizeof adds corrections for GC metadata that __sizeof__ doesn't account for.
答案2
得分: 2
我在https://github.com/pandas-dev/pandas/issues/51858 解决了这个问题。
升级 pandas,然后您应该能够无问题地运行相同的代码:
import sys
df = pd.Series(str)
sys.getsizeof(df)
英文:
I solved it at https://github.com/pandas-dev/pandas/issues/51858.
Upgrade pandas and you should be able to run the same code with no problem:
import sys
df=pd.Series(str)
sys.getsizeof(df)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论