英文:
Improving CLOS memory-efficiency in Common Lisp?
问题
上下文:
我已经开始了一个长期的CL项目,其中一个子组件是一个懒惰编程的框架,旨在与外部CL代码尽可能兼容。
其中一个类是lazy-cons
。
(defclass lazy-cons (thunk sequences:sequence)
((head :initform nil :initarg :head :accessor :head)
(tail :initform nil :initarg :tail :accessor :tail))) ;; slot 'gen is defined in the thunk superclass, as well
然而,在尝试对超过100万个数字的惰性列表进行排序时,SBCL经常耗尽堆空间并崩溃。
尝试的方法:
我实现了另一个通过缓存向量减少此问题的类。
但我仍然希望使lazy-cons
本身更节省空间,因为该项目旨在成为其他计算项目的后台框架,其中一些项目需要高性能计算。 即使最高性能的代码最终传递给CFFI,Lisp端仍应尽可能高效和可伸缩。
对于这个项目,我还需要能够扩展类并使用defmethod
来专门化一些函数(例如,这样我可以使用trivial-extensible-sequences
库)。 这排除了对于较小的表示使用structures
,因为structure
元类不允许从普通类继承。
问题:
是否可能(例如,通过MOP,也许?)控制对象的内部表示以使其更高效,同时仍允许类继承和方法专门化?
如果不行,是否有一种方法至少可以防止Lisp实现耗尽堆空间?
(我知道可以增加分配的堆,但这仍然不能保证不会意外陷入LDB;理想情况下,当堆即将失败时,代码应该陷入一种条件(可以自动解决),而不仅仅是崩溃。)
英文:
CONTEXT:
I've started a long-term project in CL, and one of the subcomponents is a framework for lazy programming, which is meant to be as compatible as possible with external CL code.
One of the classes is lazy-cons
.
(defclass lazy-cons (thunk sequences:sequence)
((head :initform nil :initarg :head :accessor :head)
(tail :initform nil :initarg :tail :accessor :tail))) ;; slot 'gen is defined in the thunk superclass, as well
However, when trying to sort lazy lists of more than 1 million numbers, SBCL constantly runs out of heap space and crashes.
Attempted approaches:
I've implemented another class that reduces this issue via cache vectors.
But I'd still like to make lazy-cons
itself more space-efficient, as this project is meant to be a background framework for other computing projects, some requiring high-performance computing. Even if the highest-performance code ends up passed to CFFI, the Lisp side should still be as efficient and scalable as possible.
For this project, I also need to be able to extend classes and use defmethod
to specialize some functions (e.g. so I can use the trivial-extensible-sequences
library). This precludes using structures
for smaller representations, as the structure
metaclass doesn't allow inheritance from normal classes.
PROBLEM:
Is it possible (e.g. via MOP, maybe?) to control objects' internal representation to be more efficient, while still allowing class inheritance and method specialization?
And if not, is there some way to at least keep the Lisp implementation from running out of heap space?
(I'm aware that you can increase the allocated heap, but that still doesn't guarantee that you won't unexpectedly fall into LDB; ideally, the code should fall into a condition (which can be automatically addressed) when the heap is about to fail, instead of just collapsing.)
答案1
得分: 7
以下是您要翻译的内容:
总结:你可能没有太多实质性的事情可做,但至少有可能你的问题是其他方面的。
在SBCL中物体占用多少内存?
在64位平台上(具体来说是在ARM64,虽然我认为x64是相同的),这里是各种对象类型的经验性分配(通过分配一个大数组并查看它占用了多少空间,然后分配一个大数组并用各种对象的多个副本填充它,查看它占用了多少空间,然后进行适当的算术运算,并检查所有这些是否合理):
standard-class
的实例 - 每对插槽48字节 + 16字节;structure-class
的实例 - 每对插槽16字节 + 16字节;- 元素类型为
t
的数组 - 每2个元素16字节 + 16字节; - 字符串 - 每4个字符16字节 + 16字节;
- 元素类型为
single-float
的数组 - 每4个元素16字节 + 16字节。 - cons - 16字节。
‘n字节每m元素’ 是因为分配发生在双字(16字节)边界上。
继承一个或多个超类插槽的类的实例不需要为它们分配更多的空间,也没有更大的标头。
这显示了如果你需要保留非立即对象(或者像Fixnums这样的大立即对象)在插槽中,标准的CLOS插槽表示实际上已经做得尽可能好了。
你可能很难节省太多内存
假设你确实想要这样做,我相当确定你确实想要,那么你唯一能赢的办法就是节省标头开销,这实际上是对于默认的CLOS对象额外的四个字(32字节)。我猜想这是一个没有希望的尝试:这个标头存在是有原因的,除非进行重大手术,否则你不太可能让它消失。
但这是真正的问题吗?
但你遇到的问题相当令人惊讶:一百万个完全成熟的standard-class
实例的开销比最好的合理情况(带有两个字头的数组)高出32MB。一百万个元素列表的总大小是16MB,用两个插槽实现的等效物的总大小是32MB,或者用两个插槽的CLOS对象是64MB。所以你支付的成本比列表高出48MB(毫不奇怪:48字节标头的一百万个副本),而最好的合理成本是16MB:比较起来好了32MB。
我没有对我的SBCL设置堆大小,它从1GB开始。这意味着一百万个48位头的开销是可用堆空间的4.8%:如果你达到了这个限制,你已经几乎没有内存了。
所以我的直觉是这里可能存在其他问题。很有可能,如果不是排序算法,那么如果这些东西代表了惰性对象,你正在使用函数以某种方式表示本质上是承诺的东西,而正是这些东西正在消耗你。我无法想出在标准CL中实现惰性的任何方法不会这样做:如果是这样,那么这个开销是不可避免的。但即便如此,一般来说函数也不是那么庞大的。
据我所知,你至少可以让SBCL处理存储耗尽错误:以下对我有效:
(defun foo (n)
(handler-case
(length (make-array n))
(storage-condition (c)
c)))
然而我不知道这有多可靠:特别是我怀疑可能有时甚至没有足够的空间来收集垃圾。
英文:
Summary: there probably is not much you can usefully do, but it seems at least plausible that your problem is something else.
How much memory do things take up in SBCL?
On a 64-bit platform (well, specifically on ARM64 although I think x64 is the same), here are empirical allocations for various object types (these are worked by allocating a big array and seeing how much space that takes, then allocating a big array and filling it with many copies of various objects, seeing how much space that takes and then doing the appropriate arithmetic, and checking it all makes sense):
- instances of
standard-class
– 48 bytes + 16 bytes per pair of slots; - instances of
structure-class
– 16 bytes + 16 bytes per pair of slots; - arrays with element type
t
– 16 bytes + 16 bytes per 2 elements; - strings – 16 bytes + 16 bytes per 4 characters;
- arrays with element-type
single-float
– 16 bytes + 16 bytes per 4 elements. - cons – 16 bytes.
The 'n bytes per m elements' comes about because allocation happens on double-word (16-byte) boundaries.
Instances of classes which inherit slots from one or more superclasses don't require more space for them, and don't have bigger headers.
What this shows is that the standard CLOS representation of slots is really as good as it can be if you need to keep non-immediate objects (or large immediate objects, like fixnums say) in the slots.
You probably cannot easily save much memory
Assuming you do want to do that which I'm fairly sure you do, then the only way you can win is by saving the header overhead, which is really an extra four words (32 bytes) for a default CLOS object. My guess is that this is a hopeless endeavour: that header is there for a reason, and you're unlikely to be able to make it go away other than by major surgery.
But is this the real problem?
But the problem you're having is quite surprising: the overhead of a million fully-fledged standard-class
instances over the best plausible case (array with two word header) is 32MB. The total size of a million-element list is 16MB, the total size of the equivalent thing implemented as structures with two slots is 32MB, or as CLOS objects with two slots 64MB. So the cost you're paying is 48MB compared with lists (not surprisingly: a million copies of a 48-byte header), and the best plausible cost is 16MB: 32MB better.
I don't do anything to set the heap size on my SBCL, and it starts with 1GB. This means that the overhead of a million 48-bit headers is 4.8% of the available heap space: if you are hitting this limit you were already almost out of memory.
So my intuition is that something else is the problem here. Chances are, if it's not the sorting algorithm, then if these things are representing lazy objects you're using functions to represent what are essentially promises in some way, and it's those that are killing you. I can't think of any way of implementing laziness in standard CL which doesn't do that: that overhead is unavoidable if so. But even so, functions are not that huge in general.
You can, as far as I can tell, at least get SBCL to handle storage exhaustion errors: the following works for me:
(defun foo (n)
(handler-case
(length (make-array n))
(storage-condition (c)
c)))
However I don't know how reliable this is: in particular I suspect there may be times where it doesn't even have enough space left to collect the garbage.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论