2023年7月23日 20:51:24go评论97阅读模式

英文:

Why does an empty Python list consume less memory than an empty NumPy array?

问题

NumPy数组通常比Python列表消耗更少的内存。然而，当我测试一个空列表时，纯Python列表占用56字节，而NumPy数组的大小为112字节。为什么？

英文:

I thought that NumPy arrays always consume less memory than Python lists. However, when I tested this with an empty list, the pure Python list had 56 bytes and the NumPy array was 112 bytes in size. Why?

为什么空的Python列表比空的NumPy数组占用更少的内存？

答案1

得分: 3

我重新打开了这个问题，因为重复的问题关注于np.reshape如何产生一个view并改变getsizeof所看到的内容。而这里的问题是列表与数组的大小问题。

让我来举个例子：（在这里贴出代码的图像不符合 SO 的风格。我们更喜欢复制粘贴代码）

你的列表和数组：

In [458]: alist = [1,2,3,4,5,7,'a','b','c','@']
In [459]: alist
Out[459]: [1, 2, 3, 4, 5, 7, 'a', 'b', 'c', '@']
In [460]: arr = np.array(alist)
In [461]: arr
Out[461]: array(['1', '2', '3', '4', '5', '7', 'a', 'b', 'c', '@'], dtype='<U21')

注意dtype。这个数组包含字符串，而不是数字。

In [462]: arr.nbytes
Out[462]: 840
In [463]: import sys
In [464]: sys.getsizeof(arr)
Out[464]: 952

getsizeof得到的是 840，再加上 112 的“开销”。对于常规数组，getsizeof提供了一个合理的数字，但实际上不是必需的。

但对于列表：

In [465]: sys.getsizeof(alist)
Out[465]: 136

我们可以通过检查长度和dtype来得到 840 字节：

In [466]: len(arr)
Out[466]: 10
In [467]: 4*21*10
Out[467]: 840

对于列表，"开销"是 56，其余的是指针的存储 - 一共有 10 个。

In [468]: sys.getsizeof([])
Out[468]: 56
In [469]: 56+80
Out[469]: 136

列表还可以为“增长空间”保留内存。getsizeof不测量指向的对象使用的内存。在这种情况下，小整数已经存在，不需要额外的内存。字符串每个占用额外的 50 字节。列表可以存储各种类型的对象，包括其他列表、字典和数组等等。getsizeof对这些对象一无所知。

数组可以使用不同的dtype，从而减少内存：

In [470]: arr1 = np.array(alist, 'U1')
In [471]: arr1
Out[471]: array(['1', '2', '3', '4', '5', '7', 'a', 'b', 'c', '@'], dtype='<U1')
In [472]: arr1.nbytes
Out[472]: 40
In [473]: sys.getsizeof(arr1)
Out[473]: 152

总之，要从getsizeof中获得有用的信息，你必须了解对象/类是如何存储的，以及该函数测量了什么。这对于 Python 初学者来说都不是一个简单的主题。好吧，初学者应该尽早或尽晚学会如何存储列表和数组。

英文:

I reopened this because the duplicate focused on how np.reshape produces a view and changes what getsizeof sees. Here the issue is the size of a list versus an array.

Let me illustrate: (posting an image of this code is not good SO style. We prefer copy-n-paste code )

Your list and array:

In [458]: alist = [1,2,3,4,5,7,&#39;a&#39;,&#39;b&#39;,&#39;c&#39;,&#39;@&#39;]
In [459]: alist
Out[459]: [1, 2, 3, 4, 5, 7, &#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;@&#39;]
In [460]: arr = np.array(alist)
In [461]: arr
Out[461]: array([&#39;1&#39;, &#39;2&#39;, &#39;3&#39;, &#39;4&#39;, &#39;5&#39;, &#39;7&#39;, &#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;@&#39;], dtype=&#39;&lt;U21&#39;)

Note the dtype. The array contains strings, not numbers.

In [462]: arr.nbytes
Out[462]: 840
In [463]: import sys
In [464]: sys.getsizeof(arr)
Out[464]: 952

getsizeof gets that 840, plus 112 'overhead'. For regular arrays, getsizeof gives a reasonable number, but really isn't needed.

But for the list:

In [465]: sys.getsizeof(alist)
Out[465]: 136

We can get the 840 bytes by checking the length and dtype:

In [466]: len(arr)
Out[466]: 10
In [467]: 4*21*10
Out[467]: 840

For the list, 'overhead' is 56, and the rest is storage for pointers - 10 of them.

In [468]: sys.getsizeof([])
Out[468]: 56
In [469]: 56+80
Out[469]: 136

Lists can also have memory for 'growth space'. getsizeof does not measure the memory used by the objects pointed to. In this case, the small integers already exist, and don't require any additional memory. The strings take up an extra 50 bytes each. Lists can store objects of various types, including other lists and dicts and arrays, etc. getsizeof tells us nothing about those.

The array could have been given a different dtype, with a reduction in memory:

In [470]: arr1 = np.array(alist,&#39;U1&#39;)
In [471]: arr1
Out[471]: array([&#39;1&#39;, &#39;2&#39;, &#39;3&#39;, &#39;4&#39;, &#39;5&#39;, &#39;7&#39;, &#39;a&#39;, &#39;b&#39;, &#39;c&#39;, &#39;@&#39;], dtype=&#39;&lt;U1&#39;)
In [472]: arr1.nbytes
Out[472]: 40
In [473]: sys.getsizeof(arr1)
Out[473]: 152

In sum, to get anything useful from getsizeof you have to understand how the object/class is stored, and just what that function measures. Neither is a trivial topic for a python beginner. Well, the beginner should learn, soon if not later, how lists and arrays are stored.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么空的Python列表比空的NumPy数组占用更少的内存？

问题

答案1

如何将结果保存在文本文件中？

根据其他数据框更改数据框的值

启用 Vertex AI 的 PipelineJob 任务的 Web 访问或交互式 Shell。

从Blob存储创建Hive元数据存储表，来自多个子文件夹

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。