为什么没有循环引用的Python对象仍然会被垃圾回收移除?

huangapple go评论77阅读模式
英文:

Why do Python objects without circular references still get removed by garbage collection?

问题

If a function creates a lot of objects in a loop that are not referenced elsewhere, they will be removed immediately due to Python's reference counting.
如果一个函数在循环中创建了许多对象,而这些对象在其他地方没有引用,它们将立即被删除,这是由于Python的引用计数机制。

If I store the objects in a list this time, the objects will be removed by garbage collection when the function exits. The objects do not have a circular reference. Why are they not removed by ReferenceCounting when the function ends and the list is removed?
如果这次将对象存储在列表中,那么这些对象将在函数退出时由垃圾回收机制移除。这些对象没有循环引用。为什么在函数结束并且列表被移除时它们不会被引用计数机制移除呢?

As an example I have the following program with two scenarios where I call the run function once with the parameter with_append = True and the other time with with_append = False.
作为示例,我有以下程序,其中有两种情况,我分别使用参数 with_append = Truewith_append = False 调用 run 函数一次。

In the first scenario, the garbage collection jumps in and the whole programme takes much longer to finish. In the second scenario, there is no garbage collection active. You can see this by running the program with py-spy and using the native option.
在第一种情况下,垃圾回收开始运行,整个程序花费的时间更长。在第二种情况下,没有进行垃圾回收。您可以通过使用 py-spy 并使用 native 选项运行程序来查看这一点。

Below is the example programme and the output for both cases.
以下是示例程序和两种情况的输出。

Scenario 1
情景1

Call run with param: with_append = True
使用参数 with_append = True 调用 run

Output of: python demo.py
输出:python demo.py

function completed -  10.940133094787598

Output of: py-spy top --native -- python demo.py
输出:py-spy top --native -- python demo.py

Total Samples 1000
GIL: 100.00%, Active: 100.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename)                                                                                                                                                  
 53.00%  53.00%    4.51s     4.51s   gc_collect_main (libpython3.10.so.1.0)
 38.00% 100.00%    4.41s    10.00s   run (demo3.py)
  9.00%  10.00%    1.07s     3.13s   __init__ (demo3.py)
  0.00%   0.00%   0.010s    0.010s   0x7f91a5b2a746 (libc-2.31.so)
  0.00% 100.00%   0.000s    10.00s   <module> (demo3.py)
  0.00%  53.00%   0.000s     4.51s   gc_collect_with_callback (libpython3.10.so.1.0)

Half of the time is spent in gc_collect_main.
一半的时间花在了 gc_collect_main 上。

Scenario 2
情景2

Call run with param: with_append = False
使用参数 with_append = False 调用 run

Output of: python demo.py
输出:python demo.py

function completed -  4.351471424102783

Output of: py-spy top --native -- python demo.py
输出:py-spy top --native -- python demo.py

Total Samples 400
GIL: 100.00%, Active: 100.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename)                                                                                                                                                  
 85.00% 100.00%    3.38s     4.00s   run (demo3.py)
 13.00%  14.00%   0.560s    0.600s   __init__ (demo3.py)
  1.00%   1.00%   0.020s    0.020s   unicode_dealloc (libpython3.10.so.1.0)
  1.00%   1.00%   0.020s    0.020s   0x7fe05fca5742 (libc-2.31.so)
  0.00%   0.00%   0.010s    0.010s   0x7fe05fca5746 (libc-2.31.so)
  0.00%   0.00%   0.010s    0.010s   0x7fe05fca5724 (libc-2.31.so)
  0.00% 100.00%   0.000s     4.00s   <module> (demo3.py)

Why is reference counting not used to free memory in both cases? And why did garbage collection take so much more time to remove the objects?
为什么在这两种情况下都没有使用引用计数来释放内存?而且为什么垃圾回收需要花费更多的时间来移除对象?

英文:

If a function creates a lot of objects in a loop that are not referenced elsewhere, they will be removed immediately due to Python's reference counting.
If I store the objects in a list this time, the objects will be removed by garbage collection when the function exits. The objects do not have a circular reference. Why are they not removed by ReferenceCounting when the function ends and the list is removed?
As an example I have the following program with two scenarios where I call the run function once with the parameter with_append = True and the other time with with_append = False.
In the first scenario, the garbage collection jumps in and the whole programme takes much longer to finish. In the second scenario, there is no garbage collection active. You can see this by running the program with py-spy and using the native option.
Below is the example programme and the output for both cases.

import time

COUNT = 10000000


class User:
    def __init__(self, name):
        self.name = name


def run(with_append):
    l = []
    for i in range(COUNT):
        u = User(f"user {i}")
        if with_append:
            l.append(u)


ts = time.time()
run(with_append=True)
print("function completed - ", time.time() - ts)

Scenario 1

Call run with param: with_append = True

Output of: python demo.py

function completed -  10.940133094787598

Output of: py-spy top --native -- python demo.py

Total Samples 1000
GIL: 100.00%, Active: 100.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename)                                                                                                                                                  
 53.00%  53.00%    4.51s     4.51s   gc_collect_main (libpython3.10.so.1.0)
 38.00% 100.00%    4.41s    10.00s   run (demo3.py)
  9.00%  10.00%    1.07s     3.13s   __init__ (demo3.py)
  0.00%   0.00%   0.010s    0.010s   0x7f91a5b2a746 (libc-2.31.so)
  0.00% 100.00%   0.000s    10.00s   <module> (demo3.py)
  0.00%  53.00%   0.000s     4.51s   gc_collect_with_callback (libpython3.10.so.1.0)

Half of the time is spend in gc_collect_main.

Scenario 2

Call run with param: with_append = False

Output of: python demo.py

function completed -  4.351471424102783

Output of: py-spy top --native -- python demo.py

Total Samples 400
GIL: 100.00%, Active: 100.00%, Threads: 1

  %Own   %Total  OwnTime  TotalTime  Function (filename)                                                                                                                                                  
 85.00% 100.00%    3.38s     4.00s   run (demo3.py)
 13.00%  14.00%   0.560s    0.600s   __init__ (demo3.py)
  1.00%   1.00%   0.020s    0.020s   unicode_dealloc (libpython3.10.so.1.0)
  1.00%   1.00%   0.020s    0.020s   0x7fe05fca5742 (libc-2.31.so)
  0.00%   0.00%   0.010s    0.010s   0x7fe05fca5746 (libc-2.31.so)
  0.00%   0.00%   0.010s    0.010s   0x7fe05fca5724 (libc-2.31.so)
  0.00% 100.00%   0.000s     4.00s   <module> (demo3.py)

Why is reference counting not used to free memory in both cases? And why did garbage collection take so much more time to remove the objects?

答案1

得分: 1

在两种情况下,很可能都是通过引用计数删除对象。

在场景1中,gc_collect_main 会在循环的中间自动调用,以检测具有循环引用的可收集对象。在场景1中,实际上并不会删除任何对象。然而,这个检测会执行数万次,因此总体上看起来会比较耗时。

这种自动检测是由对象数量的增加触发的(我不能绝对确定),所以在场景2中,在循环结束时,通过引用计数删除对象,不会在循环期间执行此操作。

您可以将以下代码放在您的脚本顶部以检查何时执行了检测。

import gc

gc.set_debug(gc.DEBUG_STATS)

或者,明确禁用循环期间的检测应显著提高性能。

import gc

def run(with_append):
    gc.disable()
    l = []
    for i in range(COUNT):
        u = User(f"user {i}")
        if with_append:
            l.append(u)
    gc.enable()

请注意,这在官方文档中有提到。

由于收集器已经补充了Python中已经使用的引用计数,如果您确定您的程序不会创建引用循环,您可以禁用收集器。

显然,Python不能自动确定是否存在循环引用(即是否需要垃圾收集),因此如果这是一个问题,您必须以这种方式明确禁用它。

英文:

Most likely, in both scenarios, objects are deleted by reference counting.

In Scenario 1, gc_collect_main is automatically called in the middle of the loop to detect collectable objects with circular references. No objects are actually deleted by this in Scenario 1. However, this detection is performed tens of thousands of times, so it seems to be time consuming overall.

This automatic detection is triggered by an increase in the number of objects (I am not absolutely sure about this), so it is not executed during the loop in Scenario 2, where the objects are deleted at the end of each loop by reference counting.

You can put the following code at the top of your script to check when the detection was performed.

import gc

gc.set_debug(gc.DEBUG_STATS)

Alternatively, explicitly disabling detection during the loop should significantly improve performance.

import gc

def run(with_append):
    gc.disable()
    l = []
    for i in range(COUNT):
        u = User(f"user {i}")
        if with_append:
            l.append(u)
    gc.enable()

Note that this is mentioned in the official documentation.

> Since the collector supplements the reference counting already used in Python, you can disable the collector if you are sure your program does not create reference cycles.

Obviously, Python cannot automatically determine whether a circular reference exists (i.e., whether garbage collection is needed), so if this is a problem, you must explicitly disable it in this way.

答案2

得分: 0

每当您重新分配 u 时,旧的 User 对象的引用计数会降至零,使其符合垃圾回收条件,因此在循环结束之前,它能够即时回收 u 中包含的 User 对象。因此,它会等待函数结束后再执行列表的垃圾回收,而不是在执行期间执行,因为列表的引用计数直到列表超出范围才会降至零。

英文:

Whenever you reassign u, the reference count for the old User object drops to zero, making it eligible for garbage collection, so before the loop ends it's able to collect the User object contained by u on the fly. So it waits until the end of the function to perform garbage collection on the list instead of doing it during the execution, because the reference count for the list does not reach zero until the list falls out of scope.

huangapple
  • 本文由 发表于 2023年4月19日 18:33:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76053484.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定