Can (the underlying array of) a slice with large starting index in Go be allocated memory-efficiently?

huangapple go评论84阅读模式
英文:

Can (the underlying array of) a slice with large starting index in Go be allocated memory-efficiently?

问题

我正在尝试使用一个切片,比如mySlice,它具有一个非常大的起始索引。与其始终将起始索引减去并使用mySlice[index - mySliceStartIndex],我更倾向于以一种可以在没有这种算术操作的情况下使用它的方式来定义切片,比如mySlice[index]。是否可以在不为所有未使用的低索引分配内存的情况下实现这一点?

一种天真的方法是先分配一个切片,然后对其进行重新切片(例如mySlice = mySlice[3*1024*1024*1024:4*1024*1024*1024])。这种方法明显是内存效率低下的,因为不仅需要为整个范围分配底层数组,而且该数组仍然保持分配状态。而且,这种方法甚至不起作用,因为之后原来在索引310241024*1024处的数据现在位于索引0处,而我的目标是保持它在原始索引处。

我是否可以以这样的方式分配切片(或其底层数组),使得切片起始索引以下的索引不被分配,最理想的情况是初始时也不分配?

英文:

I'm trying to use a slice, say mySlice, with a very large starting index. Rather than explicitly subtracting the starting index by always using it as mySlice[index - mySliceStartIndex], I am tempted to simply define the slice in such a way that I can use it without such arithmetic as mySlice[index]. Can this be done without allocating memory for all the unused low indices?

The naive way of doing this, allocating a slice and then reslicing it (e.g. mySlice = mySlice[3*1024*1024*1024:4*1024*1024*1024]) <s>is obviously memory inefficient because the underlying array not only needs to be allocated for the entire range, but remains allocated.</s> does not even work, because afterwards the data formerly at index 310241024*1024 is now at index 0, whilst my goal is to keep it at the original index.

Can I allocate the slice (or its underlying array) in such a way that indices below the slice's start are not allocated, ideally not even initially?

答案1

得分: 3

这将不可能在没有实际分配未使用部分的情况下实现。
在Go语言中,切片是通过reflect.SliceHeader来定义的。

type SliceHeader struct {
    Data uintptr
    Len  int
    Cap  int
}

它不包含起始索引字段,只是一个对底层固定大小数组的引用。
底层数组保存了实际的数据。切片只是对该数组的一个“窗口”,始终从索引0开始。无论0在底层数组中的位置如何。

例如,考虑以下代码:

a := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b := a[2:8]
c := a[8:]
d := b[2:4]

这将产生以下内存布局:

固定数组:[ 0 1 2 3 4 5 6 7 8 9 ]  > [10]int 在地址 273785072 处
切片 a    :   . . . . . . . . . .    > SliceHeader{Data:273785072 Len:10 Cap:10}
切片 b    :       . . . . . .        > SliceHeader{Data:273785080 Len:6 Cap:8}
切片 c    :                   . .    > SliceHeader{Data:273785104 Len:2 Cap:2}
切片 d    :           . .            > SliceHeader{Data:273785088 Len:2 Cap:6}

Data的值只是固定数组中的地址偏移量,所有四个切片共享底层存储。

a =:= $273785072
b =:= $273785080 =:= $a + sizeof(int)*2 =:= $a + 8
c =:= $273785104 =:= $a + sizeof(int)*8 =:= $a + 32
d =:= $273785088 =:= $b + sizeof(int)*2 =:= $a + sizeof(int)*4 =:= $a + 16

无论在现有切片上重新切片的索引是什么,新的切片始终从0len(s)进行索引,因为它指向的底层固定数组中的地址将其放置在那里。

内存映射

如果你的数据是从磁盘上的文件加载的,你可以选择另一种方式:使用syscall.Mmap来通过切片提供对数据的访问,从所需的索引开始。返回的切片现在从0开始索引,只覆盖你指定的范围。

func mmap(fd *os.File, start, size int) ([]byte, error) {
    _, err := fd.Seek(0, 0)
    if err != nil {
        return nil, err
    }

    return syscall.Mmap(int(fd.Fd()), start, size,
        syscall.PROT_READ, syscall.MAP_SHARED)
}

在使用完切片后,不要忘记调用syscall.Munmap释放切片。

英文:

This will not be possible without actually /not/ allocating the unused parts.
The way a slice is defined in Go, is through a reflect.SliceHeader

type SliceHeader struct {
    Data uintptr
    Len  int
    Cap  int
}

It contains no starting index field. Merely a reference to an underlying, fixed size array.
It is this underlying array which holds your actual data. The slice is simply a 'window' into that array, which always begins at index 0. Wherever 0 may be in the underlying array.

For instance, consider the following code:

a := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b := a[2:8]
c := a[8:]
d := b[2:4]

This yields a memory layout as follows:

fixed array: [ 0 1 2 3 4 5 6 7 8 9 ]  &gt; [10]int at address 273785072
slice a    :   . . . . . . . . . .    &gt; SliceHeader{Data:273785072 Len:10 Cap:10}
slice b    :       . . . . . .        &gt; SliceHeader{Data:273785080 Len:6 Cap:8}
slice c    :                   . .    &gt; SliceHeader{Data:273785104 Len:2 Cap:2}
slice d    :           . .            &gt; SliceHeader{Data:273785088 Len:2 Cap:6}

The values for Data are simply address offsets into the fixed array and all four slices share the underlying storage.

a =:= $273785072
b =:= $273785080 =:= $a + sizeof(int)*2 =:= $a + 8
c =:= $273785104 =:= $a + sizeof(int)*8 =:= $a + 32
d =:= $273785088 =:= $b + sizeof(int)*2 =:= $a + sizeof(int)*4 =:= $a + 16

At whatever index you re-slice an existing slice, the new slice will always be indexed from 0 to len(s), because the address in the underlying fixed array it points to puts it there.

Memory mapping

If your data is loaded from file on a disk, you can have a different option: use syscall.Mmap to provide access to the data through a slice, starting at the desired index. The returned slice is now index from 0 and it covers only the range you specified.

func mmap(fd *os.File, start, size int) ([]byte, error) {
    _, err := fd.Seek(0, 0)
    if err != nil {
        return nil, err
    }

    return syscall.Mmap(int(fd.Fd()), start, size,
        syscall.PROT_READ, syscall.MAP_SHARED)
}

Do not forget to call syscall.Munmap on the returned slice, when you are done using it.

huangapple
  • 本文由 发表于 2014年3月10日 17:22:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/22296232.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定