有没有一种高效的方法来回收超额容量的切片?

huangapple go评论92阅读模式
英文:

Is there an efficient way of reclaiming over-capacity slices?

问题

我有很多已分配的切片(几百万个),我已经对它们进行了append操作。我确定其中很多已经超过了容量。我想尝试减少内存使用。

我的第一次尝试是遍历所有切片,为每个切片分配一个新的len(oldSlice)大小的切片,并将值复制过去。不幸的是,这似乎增加了内存使用量(增加了一倍),而垃圾回收需要很长时间才能回收内存。

有没有一种好的通用方法来减少大量超容量切片的内存使用?

英文:

I have a large number of allocated slices (a few million) which I have appended to. I'm sure a large number of them are over capacity. I want to try and reduce memory usage.

My first attempt is to iterate over all of them, allocate a new slice of len(oldSlice) and copy the values over. Unfortunately this appears to increase memory usage (up to double) and the garbage collection is slow to reclaim the memory.

Is there a good general way to slim down memory usage for a large number of over-capacity slices?

答案1

得分: 1

选择正确的策略来分配缓冲区,在不了解具体问题的情况下是很困难的。

一般来说,你可以尝试重用你的缓冲区:

type buffer struct{}
    
var buffers = make(chan *buffer, 1024)
    
func newBuffer() *buffer {
    select {
    case b := <-buffers:
        return b
    default:
        return &buffer{}
    }
}
    
func returnBuffer(b *buffer) {
    select {
    case buffers <- b:
    default:
    }
}
英文:

Choosing the right strategy to allocate your buffers is hard without knowing the exact problem.

In general you can try to reuse your buffers:

type buffer struct{}

var buffers = make(chan *buffer, 1024)

func newBuffer() *buffer {
	select {
	case b:= &lt;-buffers:
		return b
		default:
		return &amp;buffer{}
	}
}

func returnBuffer(b *buffer) {
	select {
	case buffers &lt;- b:
	default:
	}
}

答案2

得分: -1

append函数中使用的启发式方法可能并不适用于所有应用程序。它设计用于在不知道要存储的数据的最终长度时使用。我建议尽早尽量减少额外分配的容量,而不是稍后迭代它们。下面是一个简单的策略示例,只在长度未知时使用缓冲区,并重复使用该缓冲区:

type buffer struct {
  names []string
  ... // 可能还有其他内容
}

// 假设这个函数经常被调用,并且有很多很多的名字
func (b *buffer) readNames(lines bufio.Scanner) ([]string, error) {
  // 从零开始,这样我们可以重复使用容量
  b.names = b.names[:0]

  for lines.Scan() {
    b.names = append(b.names, lines.Text())
  }

  // 处理错误
  err := lines.Err()
  if err == io.EOF {
    err = nil
  }

  // 分配一个最小的切片
  out := make([]string, len(b.names))
  copy(out, b.names)
  return out, err
}

当然,如果你需要一个适用于并发使用的安全版本,你需要对此进行修改;对于这种情况,我建议使用带缓冲的通道作为存储缓冲区的漏桶。

英文:

The heuristic used in append may not be suitable for all applications. It's designed for use when you don't know the final length of the data you'll be storing. Instead of iterating over them later, I'd try to minimize the amount of extra capacity you're allocating as early as possible. Here's a simple example of one strategy, which is to use a buffer only while the length is not known, and to reuse that buffer:

type buffer struct {
  names []string
  ... // possibly other things
}

// assume this is called frequently and has lots and lots of names
func (b *buffer) readNames(lines bufio.Scanner) ([]string, error) {
  // Start from zero, so we can re-use capacity
  b.names = b.names[:0]

  for lines.Scan() {
    b.names = append(b.names, lines.Text())
  }

  // Figure out the error
  err := lines.Err()
  if err == io.EOF {
    err = nil
  }

  // Allocate a minimal slice
  out := make([]string, len(b.names))
  copy(out, b.names)
  return out, err
}

Of course, you'll need to modify this if you need something that's safe for concurrent use; for that I'd recommend using a buffered channel as a leaky bucket for storing your buffers.

huangapple
  • 本文由 发表于 2014年1月27日 03:35:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/21368220.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定