编译器无法检测由循环引用特定引起的内存泄漏的原因是什么?

huangapple go评论66阅读模式
英文:

Why is it impossible for compiler to detect memory leak specifically caused by circular reference?

问题

有一些情况下,出于某种原因,我们会有意地泄漏内存。在这些情况下,我们会显式地调用类似Box::leak()mem::forget()的东西。此外,有时将项目添加到集合而不删除可能是程序的目的。然而,由于其他原因引起的内存泄漏,比如滥用强引用而非弱引用,几乎(我会说几乎不可能)是有意的。为什么编译器不能跟踪每个具有引用计数的智能指针,并在整个程序退出时拒绝编译,如果它们的计数中有任何一个不为0呢?也就是说,只有当主函数的释放阶段之后所有Arc和Rc的引用计数都为0时,程序才会编译。如果上述实现不可能,我希望知道为什么不可能。

英文:

There are certain cases that we leak memory in purpose for some reason. In these cases we explicitly call something like Box::leak() or mem::foget().
Also, sometimes it is possible that adding items to collections without removing is the purpose of program.
However, memory leaks caused by other reasons such as misuse of strong reference over weak reference is hardly (I would say impossible) to be in purpose.
Why wouldn't the compiler keep track of every single smart pointer with reference count, and reject to compile if any of their count is not 0 at the exit of entire program?
That is, a program will only compile if all Arc and Rc have 0 reference count after drop stage of main function.
If above is impossible to implement, I hope to know why it isn't.

答案1

得分: 1

要知道程序是否会终止,我们需要解决 停机问题。不幸的是,艾伦·图灵证明了没有一般性的解决方法。这就足以排除这种分析。

更一般地说,编译器不知道运行时会发生什么。它可能取决于各种输入,比如命令行参数、环境变量、文件系统的状态(目录/文件的存在以及它们包含的内容)、从远程API或数据库获取信息等网络操作的结果等等。总的来说,我们可以将所有这些信息称为程序的“环境”。程序的环境可以影响程序的行为,内存泄漏可能会在某些环境中发生,而在其他环境中则不会发生。

此外,在Rust中,当 main() 返回时,所有运行中的线程都将自动终止,并且这种终止是强制性的;线程拥有的任何值都不会运行它们的析构函数,因此它们持有的 Arc 也不会有机会被正确清理。这意味着使用线程的任何程序都很容易导致泄漏检查生成假阳性。

这种类型的静态分析根本不可行。

事实上,我们之所以拥有引用计数类型,其中一个原因就是处理编译器无法证明程序安全性的情况。这些类型从根本上隐藏了编译器对所包含值的“真实”生存期的了解。要求编译器找出隐藏生存期的类型内存中存储的值的生存期是一种矛盾。

英文:

To know if the program will even terminate, we need to solve the halting problem. Unfortunately, Alan Turing proved that there is no general solution. That alone precludes any analysis of this kind.

More generally, the compiler doesn't know what will happen at runtime. It might depend on various inputs, such as command line arguments, environment variables, the state of the filesystem (what directories/files exist and what they contain), the results of network operations such as fetching information from a remote API or database, and so on. Collectively, we can call all of this information external to the program its "environment." The program's environment can influence what the program will do, and memory leaks might happen in some environments but not others.

Further, in Rust, all running threads will terminate automatically when main() returns, and this termination is forced; any values owned by the threads will not have their destructors run, so any Arcs they hold won't get a chance to be properly cleaned up. This means any program using threads could very easily cause leak checking to generate false positives.

Static analysis of this kind simply isn't feasible.

In fact, one of the reasons why we even have reference-counted types is to handle situations where the compiler can't prove the safety of the program. These types fundamentally hide the "true" lifetime of the contained value from the compiler. Asking the compiler to figure out the lifetime of values stored within a type that hides lifetimes from the compiler is a contradiction.

huangapple
  • 本文由 发表于 2023年3月7日 13:44:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75658412.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定