英文:
Why does sys.getsizeof fail on a pandas series or data frame when they hold a type
问题
Python中的getsizeof
在包含类型数据的系列上失败,我有一个函数,需要计算任何给定参数的大小,我使用getsizeof
来执行此操作。但这是一个问题,因为getsizeof
对于这些类型的数据框来说会出现意外失败。有没有办法避免在getSizeof
中出现这种故障。
import sys
import df = pd.Series(str)
sys.getsizeof(df)
TypeError: 'str'对象的描述符'__sizeof__'需要一个参数
英文:
Python getsizeof fails on a series which holds a type as data , I have function in which I need to calculate the size of any given argument which I do with getsizeof. But this is an issue as getsizeof fails unexpectedly for these kind of dataframes. Is there a way to avoid this failure in getSizeof
import sys
df=pd.Series(str)
sys.getsizeof(df)
TypeError: descriptor '__sizeof__' of 'str' object needs an argument
答案1
得分: 3
这是一个Pandas的错误。
Pandas做出了不同寻常的决定,试图计算一个“深度”的大小,包括所有元素的大小,而不仅仅是与Series
实例本身直接相关的内存消耗。对于Series
实例的__sizeof__
实现最终会进入一个循环,尝试调用元素的__sizeof__
:
for i in range(n):
size += arr[i].__sizeof__()
return size
但像这样调用__sizeof__
是不正确的。实际上应该调用sys.getsizeof(arr[i])
。
像这样调用__sizeof__
之所以不正确,有两个原因:首先,正如你已经看到的,当元素是一个类型对象时,它会失败,因为str.__sizeof__
是计算字符串大小的未绑定方法,而不是计算str
类型对象本身大小的方法。其次,sys.getsizeof
会为__sizeof__
未考虑的GC元数据添加修正。
英文:
This is a Pandas bug.
Pandas makes the unusual decision of trying to compute a "deep" sizeof, including all element sizes, rather than just the memory consumption directly attributable to the Series
instance itself. The __sizeof__
implementation for a Series
instance eventually hits a loop that tries to call __sizeof__
on the elements:
for i in range(n):
size += arr[i].__sizeof__()
return size
but calling __sizeof__
like this is incorrect. It should really call sys.getsizeof(arr[i])
.
Calling __sizeof__
like this is wrong for two reasons: first, as you've seen, it fails when an element is a type object, because str.__sizeof__
is the unbound method for computing the size of a string, not the method for computing the size of the str
type object itself. Second, sys.getsizeof
adds corrections for GC metadata that __sizeof__
doesn't account for.
答案2
得分: 2
我在https://github.com/pandas-dev/pandas/issues/51858 解决了这个问题。
升级 pandas,然后您应该能够无问题地运行相同的代码:
import sys
df = pd.Series(str)
sys.getsizeof(df)
英文:
I solved it at https://github.com/pandas-dev/pandas/issues/51858.
Upgrade pandas and you should be able to run the same code with no problem:
import sys
df=pd.Series(str)
sys.getsizeof(df)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论