英文:
Why does an empty Python list consume less memory than an empty NumPy array?
问题
NumPy数组通常比Python列表消耗更少的内存。然而,当我测试一个空列表时,纯Python列表占用56字节,而NumPy数组的大小为112字节。为什么?
英文:
I thought that NumPy arrays always consume less memory than Python lists. However, when I tested this with an empty list, the pure Python list had 56 bytes and the NumPy array was 112 bytes in size. Why?
答案1
得分: 3
我重新打开了这个问题,因为重复的问题关注于np.reshape
如何产生一个view
并改变getsizeof
所看到的内容。而这里的问题是列表与数组的大小问题。
让我来举个例子:(在这里贴出代码的图像不符合 SO 的风格。我们更喜欢复制粘贴代码)
你的列表和数组:
In [458]: alist = [1,2,3,4,5,7,'a','b','c','@']
In [459]: alist
Out[459]: [1, 2, 3, 4, 5, 7, 'a', 'b', 'c', '@']
In [460]: arr = np.array(alist)
In [461]: arr
Out[461]: array(['1', '2', '3', '4', '5', '7', 'a', 'b', 'c', '@'], dtype='<U21')
注意dtype
。这个数组包含字符串,而不是数字。
In [462]: arr.nbytes
Out[462]: 840
In [463]: import sys
In [464]: sys.getsizeof(arr)
Out[464]: 952
getsizeof
得到的是 840,再加上 112 的“开销”。对于常规数组,getsizeof
提供了一个合理的数字,但实际上不是必需的。
但对于列表:
In [465]: sys.getsizeof(alist)
Out[465]: 136
我们可以通过检查长度和dtype
来得到 840 字节:
In [466]: len(arr)
Out[466]: 10
In [467]: 4*21*10
Out[467]: 840
对于列表,"开销"是 56,其余的是指针的存储 - 一共有 10 个。
In [468]: sys.getsizeof([])
Out[468]: 56
In [469]: 56+80
Out[469]: 136
列表还可以为“增长空间”保留内存。getsizeof
不测量指向的对象使用的内存。在这种情况下,小整数已经存在,不需要额外的内存。字符串每个占用额外的 50 字节。列表可以存储各种类型的对象,包括其他列表、字典和数组等等。getsizeof
对这些对象一无所知。
数组可以使用不同的dtype
,从而减少内存:
In [470]: arr1 = np.array(alist, 'U1')
In [471]: arr1
Out[471]: array(['1', '2', '3', '4', '5', '7', 'a', 'b', 'c', '@'], dtype='<U1')
In [472]: arr1.nbytes
Out[472]: 40
In [473]: sys.getsizeof(arr1)
Out[473]: 152
总之,要从getsizeof
中获得有用的信息,你必须了解对象/类是如何存储的,以及该函数测量了什么。这对于 Python 初学者来说都不是一个简单的主题。好吧,初学者应该尽早或尽晚学会如何存储列表和数组。
英文:
I reopened this because the duplicate focused on how np.reshape
produces a view
and changes what getsizeof
sees. Here the issue is the size of a list versus an array.
Let me illustrate: (posting an image of this code is not good SO style. We prefer copy-n-paste code )
Your list and array:
In [458]: alist = [1,2,3,4,5,7,'a','b','c','@']
In [459]: alist
Out[459]: [1, 2, 3, 4, 5, 7, 'a', 'b', 'c', '@']
In [460]: arr = np.array(alist)
In [461]: arr
Out[461]: array(['1', '2', '3', '4', '5', '7', 'a', 'b', 'c', '@'], dtype='<U21')
Note the dtype
. The array contains strings, not numbers.
In [462]: arr.nbytes
Out[462]: 840
In [463]: import sys
In [464]: sys.getsizeof(arr)
Out[464]: 952
getsizeof
gets that 840, plus 112 'overhead'. For regular arrays, getsizeof
gives a reasonable number, but really isn't needed.
But for the list:
In [465]: sys.getsizeof(alist)
Out[465]: 136
We can get the 840 bytes by checking the length and dtype:
In [466]: len(arr)
Out[466]: 10
In [467]: 4*21*10
Out[467]: 840
For the list, 'overhead' is 56, and the rest is storage for pointers - 10 of them.
In [468]: sys.getsizeof([])
Out[468]: 56
In [469]: 56+80
Out[469]: 136
Lists can also have memory for 'growth space'. getsizeof
does not measure the memory used by the objects pointed to. In this case, the small integers already exist, and don't require any additional memory. The strings take up an extra 50 bytes each. Lists can store objects of various types, including other lists and dicts and arrays, etc. getsizeof
tells us nothing about those.
The array could have been given a different dtype, with a reduction in memory:
In [470]: arr1 = np.array(alist,'U1')
In [471]: arr1
Out[471]: array(['1', '2', '3', '4', '5', '7', 'a', 'b', 'c', '@'], dtype='<U1')
In [472]: arr1.nbytes
Out[472]: 40
In [473]: sys.getsizeof(arr1)
Out[473]: 152
In sum, to get anything useful from getsizeof
you have to understand how the object/class is stored, and just what that function measures. Neither is a trivial topic for a python beginner. Well, the beginner should learn, soon if not later, how lists and arrays are stored.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论