英文:
Why do Python objects without circular references still get removed by garbage collection?
问题
If a function creates a lot of objects in a loop that are not referenced elsewhere, they will be removed immediately due to Python's reference counting.
如果一个函数在循环中创建了许多对象,而这些对象在其他地方没有引用,它们将立即被删除,这是由于Python的引用计数机制。
If I store the objects in a list this time, the objects will be removed by garbage collection when the function exits. The objects do not have a circular reference. Why are they not removed by ReferenceCounting when the function ends and the list is removed?
如果这次将对象存储在列表中,那么这些对象将在函数退出时由垃圾回收机制移除。这些对象没有循环引用。为什么在函数结束并且列表被移除时它们不会被引用计数机制移除呢?
As an example I have the following program with two scenarios where I call the run function once with the parameter with_append = True
and the other time with with_append = False
.
作为示例,我有以下程序,其中有两种情况,我分别使用参数 with_append = True
和 with_append = False
调用 run 函数一次。
In the first scenario, the garbage collection jumps in and the whole programme takes much longer to finish. In the second scenario, there is no garbage collection active. You can see this by running the program with py-spy
and using the native
option.
在第一种情况下,垃圾回收开始运行,整个程序花费的时间更长。在第二种情况下,没有进行垃圾回收。您可以通过使用 py-spy
并使用 native
选项运行程序来查看这一点。
Below is the example programme and the output for both cases.
以下是示例程序和两种情况的输出。
Scenario 1
情景1
Call run with param: with_append = True
使用参数 with_append = True 调用 run
Output of: python demo.py
输出:python demo.py
function completed - 10.940133094787598
Output of: py-spy top --native -- python demo.py
输出:py-spy top --native -- python demo.py
Total Samples 1000
GIL: 100.00%, Active: 100.00%, Threads: 1
%Own %Total OwnTime TotalTime Function (filename)
53.00% 53.00% 4.51s 4.51s gc_collect_main (libpython3.10.so.1.0)
38.00% 100.00% 4.41s 10.00s run (demo3.py)
9.00% 10.00% 1.07s 3.13s __init__ (demo3.py)
0.00% 0.00% 0.010s 0.010s 0x7f91a5b2a746 (libc-2.31.so)
0.00% 100.00% 0.000s 10.00s <module> (demo3.py)
0.00% 53.00% 0.000s 4.51s gc_collect_with_callback (libpython3.10.so.1.0)
Half of the time is spent in gc_collect_main.
一半的时间花在了 gc_collect_main 上。
Scenario 2
情景2
Call run with param: with_append = False
使用参数 with_append = False 调用 run
Output of: python demo.py
输出:python demo.py
function completed - 4.351471424102783
Output of: py-spy top --native -- python demo.py
输出:py-spy top --native -- python demo.py
Total Samples 400
GIL: 100.00%, Active: 100.00%, Threads: 1
%Own %Total OwnTime TotalTime Function (filename)
85.00% 100.00% 3.38s 4.00s run (demo3.py)
13.00% 14.00% 0.560s 0.600s __init__ (demo3.py)
1.00% 1.00% 0.020s 0.020s unicode_dealloc (libpython3.10.so.1.0)
1.00% 1.00% 0.020s 0.020s 0x7fe05fca5742 (libc-2.31.so)
0.00% 0.00% 0.010s 0.010s 0x7fe05fca5746 (libc-2.31.so)
0.00% 0.00% 0.010s 0.010s 0x7fe05fca5724 (libc-2.31.so)
0.00% 100.00% 0.000s 4.00s <module> (demo3.py)
Why is reference counting not used to free memory in both cases? And why did garbage collection take so much more time to remove the objects?
为什么在这两种情况下都没有使用引用计数来释放内存?而且为什么垃圾回收需要花费更多的时间来移除对象?
英文:
If a function creates a lot of objects in a loop that are not referenced elsewhere, they will be removed immediately due to Python's reference counting.
If I store the objects in a list this time, the objects will be removed by garbage collection when the function exits. The objects do not have a circular reference. Why are they not removed by ReferenceCounting when the function ends and the list is removed?
As an example I have the following program with two scenarios where I call the run function once with the parameter with_append = True
and the other time with with_append = False
.
In the first scenario, the garbage collection jumps in and the whole programme takes much longer to finish. In the second scenario, there is no garbage collection active. You can see this by running the program with py-spy
and using the native
option.
Below is the example programme and the output for both cases.
import time
COUNT = 10000000
class User:
def __init__(self, name):
self.name = name
def run(with_append):
l = []
for i in range(COUNT):
u = User(f"user {i}")
if with_append:
l.append(u)
ts = time.time()
run(with_append=True)
print("function completed - ", time.time() - ts)
Scenario 1
Call run with param: with_append = True
Output of: python demo.py
function completed - 10.940133094787598
Output of: py-spy top --native -- python demo.py
Total Samples 1000
GIL: 100.00%, Active: 100.00%, Threads: 1
%Own %Total OwnTime TotalTime Function (filename)
53.00% 53.00% 4.51s 4.51s gc_collect_main (libpython3.10.so.1.0)
38.00% 100.00% 4.41s 10.00s run (demo3.py)
9.00% 10.00% 1.07s 3.13s __init__ (demo3.py)
0.00% 0.00% 0.010s 0.010s 0x7f91a5b2a746 (libc-2.31.so)
0.00% 100.00% 0.000s 10.00s <module> (demo3.py)
0.00% 53.00% 0.000s 4.51s gc_collect_with_callback (libpython3.10.so.1.0)
Half of the time is spend in gc_collect_main.
Scenario 2
Call run with param: with_append = False
Output of: python demo.py
function completed - 4.351471424102783
Output of: py-spy top --native -- python demo.py
Total Samples 400
GIL: 100.00%, Active: 100.00%, Threads: 1
%Own %Total OwnTime TotalTime Function (filename)
85.00% 100.00% 3.38s 4.00s run (demo3.py)
13.00% 14.00% 0.560s 0.600s __init__ (demo3.py)
1.00% 1.00% 0.020s 0.020s unicode_dealloc (libpython3.10.so.1.0)
1.00% 1.00% 0.020s 0.020s 0x7fe05fca5742 (libc-2.31.so)
0.00% 0.00% 0.010s 0.010s 0x7fe05fca5746 (libc-2.31.so)
0.00% 0.00% 0.010s 0.010s 0x7fe05fca5724 (libc-2.31.so)
0.00% 100.00% 0.000s 4.00s <module> (demo3.py)
Why is reference counting not used to free memory in both cases? And why did garbage collection take so much more time to remove the objects?
答案1
得分: 1
在两种情况下,很可能都是通过引用计数删除对象。
在场景1中,gc_collect_main
会在循环的中间自动调用,以检测具有循环引用的可收集对象。在场景1中,实际上并不会删除任何对象。然而,这个检测会执行数万次,因此总体上看起来会比较耗时。
这种自动检测是由对象数量的增加触发的(我不能绝对确定),所以在场景2中,在循环结束时,通过引用计数删除对象,不会在循环期间执行此操作。
您可以将以下代码放在您的脚本顶部以检查何时执行了检测。
import gc
gc.set_debug(gc.DEBUG_STATS)
或者,明确禁用循环期间的检测应显著提高性能。
import gc
def run(with_append):
gc.disable()
l = []
for i in range(COUNT):
u = User(f"user {i}")
if with_append:
l.append(u)
gc.enable()
请注意,这在官方文档中有提到。
由于收集器已经补充了Python中已经使用的引用计数,如果您确定您的程序不会创建引用循环,您可以禁用收集器。
显然,Python不能自动确定是否存在循环引用(即是否需要垃圾收集),因此如果这是一个问题,您必须以这种方式明确禁用它。
英文:
Most likely, in both scenarios, objects are deleted by reference counting.
In Scenario 1, gc_collect_main
is automatically called in the middle of the loop to detect collectable objects with circular references. No objects are actually deleted by this in Scenario 1. However, this detection is performed tens of thousands of times, so it seems to be time consuming overall.
This automatic detection is triggered by an increase in the number of objects (I am not absolutely sure about this), so it is not executed during the loop in Scenario 2, where the objects are deleted at the end of each loop by reference counting.
You can put the following code at the top of your script to check when the detection was performed.
import gc
gc.set_debug(gc.DEBUG_STATS)
Alternatively, explicitly disabling detection during the loop should significantly improve performance.
import gc
def run(with_append):
gc.disable()
l = []
for i in range(COUNT):
u = User(f"user {i}")
if with_append:
l.append(u)
gc.enable()
Note that this is mentioned in the official documentation.
> Since the collector supplements the reference counting already used in Python, you can disable the collector if you are sure your program does not create reference cycles.
Obviously, Python cannot automatically determine whether a circular reference exists (i.e., whether garbage collection is needed), so if this is a problem, you must explicitly disable it in this way.
答案2
得分: 0
每当您重新分配 u
时,旧的 User
对象的引用计数会降至零,使其符合垃圾回收条件,因此在循环结束之前,它能够即时回收 u
中包含的 User
对象。因此,它会等待函数结束后再执行列表的垃圾回收,而不是在执行期间执行,因为列表的引用计数直到列表超出范围才会降至零。
英文:
Whenever you reassign u
, the reference count for the old User
object drops to zero, making it eligible for garbage collection, so before the loop ends it's able to collect the User object contained by u
on the fly. So it waits until the end of the function to perform garbage collection on the list instead of doing it during the execution, because the reference count for the list does not reach zero until the list falls out of scope.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论