英文:
Why does Rust std::alloc allocate with larger than expected gaps?
问题
如果我使用大小为256且对齐为1024的示例布局进行两次分配,我期望第二次分配位于第一次分配之后的1024的整数倍位置(即ptr2 - ptr1 == 1024)。但我发现两个分配之间有2048字节大小的间隙。
use std::alloc::alloc;
use std::alloc::Layout;
fn main() {
unsafe {
let size = 256;
let alignment = 1024;
let large_layout = Layout::from_size_align_unchecked(size, alignment);
let ptr1 = alloc(large_layout) as usize;
let ptr2 = alloc(large_layout) as usize;
// 我期望这会打印出1024,但它打印出2048...
println!("Difference1: {}", ptr2 - ptr1);
}
}
据我理解,对齐要求使得分配仅发生在对齐的整数倍位置,这似乎是正确的,但似乎还发生了其他事情。我知道一个分配还需要一个字的空间来存储分配的大小,这在某些情况下可能解释为何间隙可能比预期大。然而,在大小=256且对齐=1024的情况下,应该有足够的空间让它们连续分配吧?以下是我在不同大小和对齐方式下指针之间间隙的一些实验结果。在某些情况下,我对于看起来不是将其四舍五入到最接近的对齐位置,而是间隙是我预期的两倍感到困惑。
| 大小 | 对齐 | 间隙 |
| ---- | --------- | ---- |
| 4 | 32 | 32 |
| 8 | 32 | 32 |
| 16 | 32 | 64 | ???
| 32 | 32 | 64 |
| ---- | --------- | ---- |
| 4 | 64 | 64 |
| 8 | 64 | 64 |
| 16 | 64 | 64 |
| 32 | 64 | 128 | ???
| 64 | 64 | 128 |
| ---- | --------- | ---- |
| 256 | 1024 | 2048 | ???
| 512 | 1024 | 2048 | ???
| 1024 | 1024 | 2048 |
| ---- | --------- | ---- |
英文:
If I were to make two allocs using an example layout with size 256 and alignment 1024, I would expect the second allocation to be at the first multiple of 1024 after the first (ie ptr2 - ptr1 == 1024). Instead I am finding that there is a gap twice the size of 2048 bytes between the two allocs.
use std::alloc::alloc;
use std::alloc::Layout;
fn main() {
unsafe {
let size = 256;
let alignment = 1024
let large_layout = Layout::from_size_align_unchecked(size, alignment);
let ptr1 = alloc(large_layout) as usize;
let ptr2 = alloc(large_layout) as usize;
// I would expect this to print 1024, but it prints 2048...
println!("Difference1: {}", ptr2 - ptr1);
}
}
As I understand, the alignment makes it so that allocs only occur at multiples of the alignment, which does seem to be true, but it also seems like something else is going on. I know that an alloc also needs a word of space for the size of the alloc, which could explain in some cases why the gap might be larger than expected. However, in the case of size = 256, and alignment = 1024, there should be plenty of space between allocs allowing for them to be alloc'ed back to back? Here are some results of my experimentation between gaps of the pointers with different sizes and alignments. I'm confused at the examples where it seems that instead of rounding up to the nearest alignment, the gap is double what I expect.
| size | alignment | gap |
| ---- | --------- | ---- |
| 4 | 32 | 32 |
| 8 | 32 | 32 |
| 16 | 32 | 64 | ???
| 32 | 32 | 64 |
| ---- | --------- | ---- |
| 4 | 64 | 64 |
| 8 | 64 | 64 |
| 16 | 64 | 64 |
| 32 | 64 | 128 | ???
| 64 | 64 | 128 |
| ---- | --------- | ---- |
| 256 | 1024 | 2048 | ???
| 512 | 1024 | 2048 | ???
| 1024 | 1024 | 2048 |
| ---- | --------- | ---- |
答案1
得分: 2
接口与实现
首先,std::alloc::alloc
是一个 接口,而不是一个 实现。根据您的平台和传递给标准库的标志,您可能会使用系统分配器(在Windows和Unix之间不同)或musl分配器等等。
std::alloc::alloc
只是一个关于标准库使用的内存分配器的薄抽象层,提供了一个统一的 接口,但不一定提供统一的 行为:唯一的行为保证(简化)是,如果您获得一个指针,它将遵循所需的布局,并且提供的内存切片在其整个生命周期内不会与任何其他分配重叠... 就是这样。
大小与对齐
为了效率起见,内存分配器倾向于使用 slabs(块),特别是对于较小的大小。简而言之,它们选择内存的一个区域,并将其切成相等大小的块。也就是说:
- A 块最多包含 8 字节。
- B 块最多包含 12 字节。
- C 块最多包含 16 字节。
- ...
这意味着当它们接收到需要 2n 字节对齐的请求时,它们将选择一个块大小至少为 2n 字节的块,因为这是满足请求的最简单方法。
这实际上在 Layout::from_size_align
中有所暗示:
根据给定的
size
和align
构建布局,或者如果不满足以下任何条件,则返回LayoutError
:
align
不能为零,align
必须是二的幂,size
,当向上舍入到最接近align
的倍数时,不能溢出isize
(即,舍入值必须小于或等于isize::MAX
)。
注意最后一行,谈到将 size
向上舍入到最接近 align
的倍数。
虽然不 保证 会进行舍入,但是 实现 可能 希望进行舍入,因此 Layout
中的保证确保它可以安全地这样做。
小间隙
要确保,您需要确定正在使用的底层实现是哪个 —— 可能是您的系统分配器,这取决于您使用的平台 —— 并且有人需要深入查看源代码。
可能会有一个头部附加到您正在使用的任何内存分配器的分配之前,其中存储其自己的元数据,这将需要 "填充" 分配的大小,并有时会导致将分配 "挤出" 到下一个分配类别。
或者,您可能会看到一个安全功能在起作用,其中会在分配之前/之后添加 "canaries"(标志)以捕捉后续的内存写入。
或者...
大间隙
通常,不会为大尺寸使用块。相反,在某个时候,分配器通常会切换到使用操作系统页面的整数倍。在x86上,典型的操作系统页面大小为4KB,所以当接近4KB时,您会看到行为的切换:告别细粒度的块,拥抱粗粒度的页面。
特定的分配器很可能在1KB处开始切换;特别是,很可能对于1KB,即使您要求更小的大小,它也不会尝试将头部放入分配本身。
但是间隙!
是的,间隙。
与任何软件一样,内存分配器存在权衡。总的来说,在现代系统上,它们倾向于优先考虑快速分配/释放而不是紧凑的内存使用。这意味着它们不会被优化以在尽可能紧凑的空间中容纳尽可能多的分配;至少不会以速度为代价。
因为,让我们面对现实,用户最关心的是速度。
英文:
Interface vs Implementation
First of all, std::alloc::alloc
is an interface, not an implementation. Depending on your platforms, and the flags passed to the standard library, you may end up with the system allocator (which differs between Windows and Unix) or the musl allocator, etc...
std::alloc::alloc
is just a thin abstraction layer about whichever memory allocator the standard library uses, providing a uniform interface, but not necessarily a uniform behavior: the only behavior guarantees provided are (simplifying) that if you do get a pointer, it'll obey the layout required, and the slice of memory provided will not overlap with any other allocation for its entire lifetime... and that's about it.
Size vs Alignment
For efficiency sake, memory allocators tend to operate with slabs, especially for low sizes. In short, they pick one area of memory, and slice it in blocks of equal sizes. That is:
- A slabs of up to 8 bytes blocks.
- B slabs of up to 12 bytes blocks.
- C slabs of up to 16 bytes blocks.
- ...
This means that when they receive a request requiring an alignment of 2<sup>n</sup> bytes, they'll pick a slab with a block size of at least 2<sup>n</sup> bytes as that's the easiest way to fulfill the request.
This is actually hinted at in Layout::from_size_align
:
> Constructs a Layout from a given size
and align
, or returns
> LayoutError
if any of the following conditions are not met:
>
> - align
must not be zero,
> - align
must be a power of two,
> - size
, when rounded up to the nearest multiple of align
, must not overflow isize
(i.e., the rounded value must be less than or
> equal to isize::MAX
).
Note the last line, talking about size
being rounded up to the nearest multiple of align
.
It's not guaranteed that rounding will occur, but the implementation may wish to round up, and therefore the guarantee in Layout
ensures that it can do so soundly.
Small Gaps
To be sure, you'd need to identify which underlying implementation you are using -- likely your system allocator, which depends on the platform you're using -- and someone would need to dive into the source code.
It's possible that a header is prepended to the allocation by whichever memory allocator you are using, where it stores its own metadata, which would require "padding" the allocated size, and would sometimes result in "bumping" an allocation to the next allocation class.
Or you may see a safety feature at play, where canaries are prepended/appended to catch stray memory writes a posteriori.
Or...
Big Gaps
Slabs are not typically used for large sizes. Instead, at some point, the allocator will typically switch to using round numbers of OS pages. On x86, the typical OS page is 4KB, so when getting closer to 4KB you'll see a switch of behavior: goodbye fine-grained slabs, hello coarse-grained pages.
It's quite possible that the particular allocator you are using starts switching at 1KB already; in particular, it's quite possible that for 1KB it doesn't attempt to fit the header within the allocation itself (even when you ask for a lower size).
But Gaps!
Yes, gaps.
As with any piece of software, memory allocators have trade-offs. In general, on modern systems, they'll tend to favor quick allocation/deallocation over tight memory usage. And that means they won't be optimized to fit as many allocations in as tight a space as possible; not at the detriment of speed anyway.
Because, let's face it, users care most about speed.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论