为什么空的Python列表比空的NumPy数组占用更少的内存?

huangapple go评论62阅读模式
英文:

Why does an empty Python list consume less memory than an empty NumPy array?

问题

NumPy数组通常比Python列表消耗更少的内存。然而,当我测试一个空列表时,纯Python列表占用56字节,而NumPy数组的大小为112字节。为什么?

英文:

I thought that NumPy arrays always consume less memory than Python lists. However, when I tested this with an empty list, the pure Python list had 56 bytes and the NumPy array was 112 bytes in size. Why?

为什么空的Python列表比空的NumPy数组占用更少的内存?

答案1

得分: 3

我重新打开了这个问题,因为重复的问题关注于np.reshape如何产生一个view并改变getsizeof所看到的内容。而这里的问题是列表与数组的大小问题。

让我来举个例子:(在这里贴出代码的图像不符合 SO 的风格。我们更喜欢复制粘贴代码)

你的列表和数组:

In [458]: alist = [1,2,3,4,5,7,'a','b','c','@']
In [459]: alist
Out[459]: [1, 2, 3, 4, 5, 7, 'a', 'b', 'c', '@']
In [460]: arr = np.array(alist)
In [461]: arr
Out[461]: array(['1', '2', '3', '4', '5', '7', 'a', 'b', 'c', '@'], dtype='<U21')

注意dtype。这个数组包含字符串,而不是数字。

In [462]: arr.nbytes
Out[462]: 840
In [463]: import sys
In [464]: sys.getsizeof(arr)
Out[464]: 952

getsizeof得到的是 840,再加上 112 的“开销”。对于常规数组,getsizeof提供了一个合理的数字,但实际上不是必需的。

但对于列表:

In [465]: sys.getsizeof(alist)
Out[465]: 136

我们可以通过检查长度和dtype来得到 840 字节:

In [466]: len(arr)
Out[466]: 10
In [467]: 4*21*10
Out[467]: 840

对于列表,"开销"是 56,其余的是指针的存储 - 一共有 10 个。

In [468]: sys.getsizeof([])
Out[468]: 56
In [469]: 56+80
Out[469]: 136

列表还可以为“增长空间”保留内存。getsizeof不测量指向的对象使用的内存。在这种情况下,小整数已经存在,不需要额外的内存。字符串每个占用额外的 50 字节。列表可以存储各种类型的对象,包括其他列表、字典和数组等等。getsizeof对这些对象一无所知。

数组可以使用不同的dtype,从而减少内存:

In [470]: arr1 = np.array(alist, 'U1')
In [471]: arr1
Out[471]: array(['1', '2', '3', '4', '5', '7', 'a', 'b', 'c', '@'], dtype='<U1')
In [472]: arr1.nbytes
Out[472]: 40
In [473]: sys.getsizeof(arr1)
Out[473]: 152

总之,要从getsizeof中获得有用的信息,你必须了解对象/类是如何存储的,以及该函数测量了什么。这对于 Python 初学者来说都不是一个简单的主题。好吧,初学者应该尽早或尽晚学会如何存储列表和数组。

英文:

I reopened this because the duplicate focused on how np.reshape produces a view and changes what getsizeof sees. Here the issue is the size of a list versus an array.

Let me illustrate: (posting an image of this code is not good SO style. We prefer copy-n-paste code )

Your list and array:

In [458]: alist = [1,2,3,4,5,7,&#39;a&#39;,&#39;b&#39;,&#39;c&#39;,&#39;@&#39;]
In [459]: alist
Out[459]: [1, 2, 3, 4, 5, 7, &#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;@&#39;]
In [460]: arr = np.array(alist)
In [461]: arr
Out[461]: array([&#39;1&#39;, &#39;2&#39;, &#39;3&#39;, &#39;4&#39;, &#39;5&#39;, &#39;7&#39;, &#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;@&#39;], dtype=&#39;&lt;U21&#39;)

Note the dtype. The array contains strings, not numbers.

In [462]: arr.nbytes
Out[462]: 840
In [463]: import sys
In [464]: sys.getsizeof(arr)
Out[464]: 952

getsizeof gets that 840, plus 112 'overhead'. For regular arrays, getsizeof gives a reasonable number, but really isn't needed.

But for the list:

In [465]: sys.getsizeof(alist)
Out[465]: 136

We can get the 840 bytes by checking the length and dtype:

In [466]: len(arr)
Out[466]: 10
In [467]: 4*21*10
Out[467]: 840

For the list, 'overhead' is 56, and the rest is storage for pointers - 10 of them.

In [468]: sys.getsizeof([])
Out[468]: 56
In [469]: 56+80
Out[469]: 136

Lists can also have memory for 'growth space'. getsizeof does not measure the memory used by the objects pointed to. In this case, the small integers already exist, and don't require any additional memory. The strings take up an extra 50 bytes each. Lists can store objects of various types, including other lists and dicts and arrays, etc. getsizeof tells us nothing about those.

The array could have been given a different dtype, with a reduction in memory:

In [470]: arr1 = np.array(alist,&#39;U1&#39;)
In [471]: arr1
Out[471]: array([&#39;1&#39;, &#39;2&#39;, &#39;3&#39;, &#39;4&#39;, &#39;5&#39;, &#39;7&#39;, &#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;@&#39;], dtype=&#39;&lt;U1&#39;)
In [472]: arr1.nbytes
Out[472]: 40
In [473]: sys.getsizeof(arr1)
Out[473]: 152

In sum, to get anything useful from getsizeof you have to understand how the object/class is stored, and just what that function measures. Neither is a trivial topic for a python beginner. Well, the beginner should learn, soon if not later, how lists and arrays are stored.

huangapple
  • 本文由 发表于 2023年7月23日 20:51:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76748337.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定