英文:
Can (the underlying array of) a slice with large starting index in Go be allocated memory-efficiently?
问题
我正在尝试使用一个切片,比如mySlice
,它具有一个非常大的起始索引。与其始终将起始索引减去并使用mySlice[index - mySliceStartIndex]
,我更倾向于以一种可以在没有这种算术操作的情况下使用它的方式来定义切片,比如mySlice[index]
。是否可以在不为所有未使用的低索引分配内存的情况下实现这一点?
一种天真的方法是先分配一个切片,然后对其进行重新切片(例如mySlice = mySlice[3*1024*1024*1024:4*1024*1024*1024]
)。这种方法明显是内存效率低下的,因为不仅需要为整个范围分配底层数组,而且该数组仍然保持分配状态。而且,这种方法甚至不起作用,因为之后原来在索引310241024*1024处的数据现在位于索引0处,而我的目标是保持它在原始索引处。
我是否可以以这样的方式分配切片(或其底层数组),使得切片起始索引以下的索引不被分配,最理想的情况是初始时也不分配?
英文:
I'm trying to use a slice, say mySlice
, with a very large starting index. Rather than explicitly subtracting the starting index by always using it as mySlice[index - mySliceStartIndex]
, I am tempted to simply define the slice in such a way that I can use it without such arithmetic as mySlice[index]
. Can this be done without allocating memory for all the unused low indices?
The naive way of doing this, allocating a slice and then reslicing it (e.g. mySlice = mySlice[3*1024*1024*1024:4*1024*1024*1024]
) <s>is obviously memory inefficient because the underlying array not only needs to be allocated for the entire range, but remains allocated.</s> does not even work, because afterwards the data formerly at index 310241024*1024 is now at index 0, whilst my goal is to keep it at the original index.
Can I allocate the slice (or its underlying array) in such a way that indices below the slice's start are not allocated, ideally not even initially?
答案1
得分: 3
这将不可能在没有实际分配未使用部分的情况下实现。
在Go语言中,切片是通过reflect.SliceHeader
来定义的。
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
它不包含起始索引字段,只是一个对底层固定大小数组的引用。
底层数组保存了实际的数据。切片只是对该数组的一个“窗口”,始终从索引0
开始。无论0
在底层数组中的位置如何。
例如,考虑以下代码:
a := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b := a[2:8]
c := a[8:]
d := b[2:4]
这将产生以下内存布局:
固定数组:[ 0 1 2 3 4 5 6 7 8 9 ] > [10]int 在地址 273785072 处
切片 a : . . . . . . . . . . > SliceHeader{Data:273785072 Len:10 Cap:10}
切片 b : . . . . . . > SliceHeader{Data:273785080 Len:6 Cap:8}
切片 c : . . > SliceHeader{Data:273785104 Len:2 Cap:2}
切片 d : . . > SliceHeader{Data:273785088 Len:2 Cap:6}
Data
的值只是固定数组中的地址偏移量,所有四个切片共享底层存储。
a =:= $273785072
b =:= $273785080 =:= $a + sizeof(int)*2 =:= $a + 8
c =:= $273785104 =:= $a + sizeof(int)*8 =:= $a + 32
d =:= $273785088 =:= $b + sizeof(int)*2 =:= $a + sizeof(int)*4 =:= $a + 16
无论在现有切片上重新切片的索引是什么,新的切片始终从0
到len(s)
进行索引,因为它指向的底层固定数组中的地址将其放置在那里。
内存映射
如果你的数据是从磁盘上的文件加载的,你可以选择另一种方式:使用syscall.Mmap
来通过切片提供对数据的访问,从所需的索引开始。返回的切片现在从0
开始索引,只覆盖你指定的范围。
func mmap(fd *os.File, start, size int) ([]byte, error) {
_, err := fd.Seek(0, 0)
if err != nil {
return nil, err
}
return syscall.Mmap(int(fd.Fd()), start, size,
syscall.PROT_READ, syscall.MAP_SHARED)
}
在使用完切片后,不要忘记调用syscall.Munmap
释放切片。
英文:
This will not be possible without actually /not/ allocating the unused parts.
The way a slice is defined in Go, is through a reflect.SliceHeader
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
It contains no starting index field. Merely a reference to an underlying, fixed size array.
It is this underlying array which holds your actual data. The slice is simply a 'window' into that array, which always begins at index 0
. Wherever 0
may be in the underlying array.
For instance, consider the following code:
a := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b := a[2:8]
c := a[8:]
d := b[2:4]
This yields a memory layout as follows:
fixed array: [ 0 1 2 3 4 5 6 7 8 9 ] > [10]int at address 273785072
slice a : . . . . . . . . . . > SliceHeader{Data:273785072 Len:10 Cap:10}
slice b : . . . . . . > SliceHeader{Data:273785080 Len:6 Cap:8}
slice c : . . > SliceHeader{Data:273785104 Len:2 Cap:2}
slice d : . . > SliceHeader{Data:273785088 Len:2 Cap:6}
The values for Data
are simply address offsets into the fixed array and all four slices share the underlying storage.
a =:= $273785072
b =:= $273785080 =:= $a + sizeof(int)*2 =:= $a + 8
c =:= $273785104 =:= $a + sizeof(int)*8 =:= $a + 32
d =:= $273785088 =:= $b + sizeof(int)*2 =:= $a + sizeof(int)*4 =:= $a + 16
At whatever index you re-slice an existing slice, the new slice will always be indexed from 0
to len(s)
, because the address in the underlying fixed array it points to puts it there.
Memory mapping
If your data is loaded from file on a disk, you can have a different option: use syscall.Mmap
to provide access to the data through a slice, starting at the desired index. The returned slice is now index from 0
and it covers only the range you specified.
func mmap(fd *os.File, start, size int) ([]byte, error) {
_, err := fd.Seek(0, 0)
if err != nil {
return nil, err
}
return syscall.Mmap(int(fd.Fd()), start, size,
syscall.PROT_READ, syscall.MAP_SHARED)
}
Do not forget to call syscall.Munmap
on the returned slice, when you are done using it.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论