在Go语言中进行切片分块

huangapple go评论83阅读模式
英文:

Slice chunking in Go

问题

我有一个包含大约210万个日志字符串的切片,我想创建一个切片的切片,使得字符串尽可能均匀地分布。

以下是我目前的代码:

// logs是一个包含大约210万个字符串的切片。
var divided = make([][]string, 0)
NumCPU := runtime.NumCPU()
ChunkSize := len(logs) / NumCPU
for i := 0; i < NumCPU; i++ {
    temp := make([]string, 0)
    idx := i * ChunkSize
    end := i * ChunkSize + ChunkSize
    for x := range logs[idx:end] {
        temp = append(temp, logs[x])
    }
    if i == NumCPU {
        for x := range logs[idx:] {
            temp = append(temp, logs[x])
        }
    }
    divided = append(divided, temp)
}

idx := i * ChunkSize 将给出当前块的起始索引,end := i * ChunkSize + ChunkSize 将给出块的结束索引。我找不到关于如何在Go中分块/拆分切片或在有限范围内迭代的文档或示例,所以这是我想出的方法。然而,它只是多次复制第一个块,所以它不起作用。

如何(尽可能均匀地)分块一个切片在Go中?

英文:

I have a slice with ~2.1 million log strings in it, and I would like to create a slice of slices with the strings being as evenly distributed as possible.

Here is what I have so far:

// logs is a slice with ~2.1 million strings in it.
var divided = make([][]string, 0)
NumCPU := runtime.NumCPU()
ChunkSize := len(logs) / NumCPU
for i := 0; i &lt; NumCPU; i++ {
	temp := make([]string, 0)
	idx := i * ChunkSize
	end := i * ChunkSize + ChunkSize
	for x := range logs[idx:end] {
		temp = append(temp, logs[x])
	}
	if i == NumCPU {
		for x := range logs[idx:] {
			temp = append(temp, logs[x])
		}
	}
	divided = append(divided, temp)
}

The idx := i * ChunkSize will give me the current "chunk start" for the logs index, and end := i * ChunkSize + ChunkSize will give me the "chunk end", or the end of the range of that chunk. I couldn't find any documentation or examples on how to chunk/split a slice or iterate over a limited range in Go, so this is what I came up with. However, it only copies the first chunk multiple times, so it doesn't work.

How do I (as evenly as possible) chunk an slice in Go?

答案1

得分: 99

你不需要创建新的切片,只需将logs的切片追加到divided切片中。

var divided [][]string

chunkSize := (len(logs) + numCPU - 1) / numCPU

for i := 0; i < len(logs); i += chunkSize {
    end := i + chunkSize

    if end > len(logs) {
        end = len(logs)
    }

    divided = append(divided, logs[i:end])
}

fmt.Printf("%#v\n", divided)

链接:http://play.golang.org/p/vyihJZlDVy

英文:

You don't need to make new slices, just append slices of logs to the divided slice.

http://play.golang.org/p/vyihJZlDVy

var divided [][]string

chunkSize := (len(logs) + numCPU - 1) / numCPU

for i := 0; i &lt; len(logs); i += chunkSize {
	end := i + chunkSize

	if end &gt; len(logs) {
		end = len(logs)
	}

	divided = append(divided, logs[i:end])
}

fmt.Printf(&quot;%#v\n&quot;, divided)

答案2

得分: 13

使用泛型(Go版本>=1.18):

func chunkBy[T any](items []T, chunkSize int) (chunks [][]T) {
	for chunkSize < len(items) {
		items, chunks = items[chunkSize:], append(chunks, items[0:chunkSize:chunkSize])
	}
	return append(chunks, items)
}

Playground URL

或者如果你想手动设置容量:

func chunkBy[T any](items []T, chunkSize int) [][]T {
	var _chunks = make([][]T, 0, (len(items)/chunkSize)+1)
	for chunkSize < len(items) {
		items, _chunks = items[chunkSize:], append(_chunks, items[0:chunkSize:chunkSize])
	}
	return append(_chunks, items)
}

Playground URL

英文:

Using generics (Go version >=1.18):

func chunkBy[T any](items []T, chunkSize int) (chunks [][]T) {
	for chunkSize &lt; len(items) {
		items, chunks = items[chunkSize:], append(chunks, items[0:chunkSize:chunkSize])
	}
	return append(chunks, items)
}

Playground URL

Or if you want to manually set the capacity:

func chunkBy[T any](items []T, chunkSize int) [][]T {
	var _chunks = make([][]T, 0, (len(items)/chunkSize)+1)
	for chunkSize &lt; len(items) {
		items, _chunks = items[chunkSize:], append(_chunks, items[0:chunkSize:chunkSize])
	}
	return append(_chunks, items)
}

Playground URL

答案3

得分: 5

另一种变体。它的运行速度比JimB提出的方法快大约2.5倍。测试和基准数据在这里

https://play.golang.org/p/WoXHqGjozMI

func chunks(xs []string, chunkSize int) [][]string {
	if len(xs) == 0 {
		return nil
	}
	divided := make([][]string, (len(xs)+chunkSize-1)/chunkSize)
	prev := 0
	i := 0
	till := len(xs) - chunkSize
	for prev < till {
		next := prev + chunkSize
		divided[i] = xs[prev:next]
		prev = next
		i++
	}
	divided[i] = xs[prev:]
	return divided
}
英文:

Another variant. It works about 2.5 times faster than the one proposed by JimB. The tests and benchmarks are here.

https://play.golang.org/p/WoXHqGjozMI

func chunks(xs []string, chunkSize int) [][]string {
	if len(xs) == 0 {
		return nil
	}
	divided := make([][]string, (len(xs)+chunkSize-1)/chunkSize)
	prev := 0
	i := 0
	till := len(xs) - chunkSize
	for prev &lt; till {
		next := prev + chunkSize
		divided[i] = xs[prev:next]
		prev = next
		i++
	}
	divided[i] = xs[prev:]
	return divided
}

答案4

得分: 3

根据Slice Tricks中的说明,以下是使用最小内存分配进行批处理的示例代码:

actions := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
batchSize := 3
batches := make([][]int, 0, (len(actions) + batchSize - 1) / batchSize)

for batchSize < len(actions) {
    actions, batches = actions[batchSize:], append(batches, actions[0:batchSize:batchSize])
}
batches = append(batches, actions)

执行以上代码后,将得到以下结果:

[[0 1 2] [3 4 5] [6 7 8] [9]]
英文:

Per Slice Tricks

Batching with minimal allocation

> Useful if you want to do batch processing on large slices.

actions := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
batchSize := 3
batches := make([][]int, 0, (len(actions) + batchSize - 1) / batchSize)

for batchSize &lt; len(actions) {
    actions, batches = actions[batchSize:], append(batches, actions[0:batchSize:batchSize])
}
batches = append(batches, actions)

Yields the following:

[[0 1 2] [3 4 5] [6 7 8] [9]]

答案5

得分: 1

func chunkSlice(items []int32, chunkSize int32) (chunks [][]int32) {
	// 当剩余的元素数量大于 chunkSize 时...
	for chunkSize < int32(len(items)) {
		// 我们从 items 数组中取出大小为 chunkSize 的切片,并将其追加到新数组中
		chunks = append(chunks, items[0:chunkSize])
		// 然后我们从 items 数组中移除这些元素
		items = items[chunkSize:]
	}
	// 最后,我们将剩余的元素追加到新数组中并返回
	return append(chunks, items)
}

可视化示例

假设我们要将一个数组分成大小为 3 的块

items:  [1,2,3,4,5,6,7]
chunks: []

items:  [1,2,3,4,5,6,7]
chunks: [[1,2,3]]

items:  [4,5,6,7]
chunks: [[1,2,3]]

items:  [4,5,6,7]
chunks: [[1,2,3],[4,5,6]]

items:  [7]
chunks: [[1,2,3],[4,5,6]]

items:  [7]
chunks: [[1,2,3],[4,5,6],[7]]
return
英文:
func chunkSlice(items []int32, chunkSize int32) (chunks [][]int32) {
 //While there are more items remaining than chunkSize...
 for chunkSize &lt; int32(len(items)) {
    //We take a slice of size chunkSize from the items array and append it to the new array
	chunks = append(chunks, items[0:chunkSize])
    //Then we remove those elements from the items array
	items = items[chunkSize:]
 }
 //Finally we append the remaining items to the new array and return it
 return append(chunks, items)
}

Visual example

Say we want to split an array into chunks of 3

items:  [1,2,3,4,5,6,7]
chunks: []

items:  [1,2,3,4,5,6,7]
chunks: [[1,2,3]]

items:  [4,5,6,7]
chunks: [[1,2,3]]

items:  [4,5,6,7]
chunks: [[1,2,3],[4,5,6]]

items:  [7]
chunks: [[1,2,3],[4,5,6]]

items:  [7]
chunks: [[1,2,3],[4,5,6],[7]]
return

答案6

得分: 0

使用reflect包来处理任意类型的切片[]T。

package main

import (
	"fmt"
	"github.com/kirito41dd/xslice"
)

func main() {
	s := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
	i := xslice.SplitToChunks(s, 3)
	ss := i.([][]int)
	fmt.Println(ss) // [[0 1 2] [3 4 5] [6 7 8] [9]]
}

在这个例子中,使用了github.com/kirito41dd/xslice包中的SplitToChunks函数对切片进行分块操作。具体实现可以参考这里

func SplitToChunks(slice interface{}, chunkSize int) interface{} {
	sliceType := reflect.TypeOf(slice)
	sliceVal := reflect.ValueOf(slice)
	length := sliceVal.Len()
	if sliceType.Kind() != reflect.Slice {
		panic("parameter must be []T")
	}
	n := 0
	if length%chunkSize > 0 {
		n = 1
	}
	SST := reflect.MakeSlice(reflect.SliceOf(sliceType), 0, length/chunkSize+n)
	st, ed := 0, 0
	for st < length {
		ed = st + chunkSize
		if ed > length {
			ed = length
		}
		SST = reflect.Append(SST, sliceVal.Slice(st, ed))
		st = ed
	}
	return SST.Interface()
}

这个函数接受一个切片和一个块大小作为参数,返回一个分块后的切片。函数内部使用了reflect包来获取切片的类型和值,并进行分块操作。如果传入的参数不是切片类型,会抛出一个panic。最后,函数返回一个接口类型的切片。

英文:

use reflect for any []T

https://github.com/kirito41dd/xslice

package main

import (
	&quot;fmt&quot;
	&quot;github.com/kirito41dd/xslice&quot;
)

func main() {
	s := []int{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
	i := xslice.SplitToChunks(s, 3)
	ss := i.([][]int)
	fmt.Println(ss) // [[0 1 2] [3 4 5] [6 7 8] [9]]
}

https://github.com/kirito41dd/xslice/blob/e50d91fa75241a3a03d262ad51c8e4cb2ea4b995/split.go#L12

func SplitToChunks(slice interface{}, chunkSize int) interface{} {
	sliceType := reflect.TypeOf(slice)
	sliceVal := reflect.ValueOf(slice)
	length := sliceVal.Len()
	if sliceType.Kind() != reflect.Slice {
		panic(&quot;parameter must be []T&quot;)
	}
	n := 0
	if length%chunkSize &gt; 0 {
		n = 1
	}
	SST := reflect.MakeSlice(reflect.SliceOf(sliceType), 0, length/chunkSize+n)
	st, ed := 0, 0
	for st &lt; length {
		ed = st + chunkSize
		if ed &gt; length {
			ed = length
		}
		SST = reflect.Append(SST, sliceVal.Slice(st, ed))
		st = ed
	}
	return SST.Interface()
}

答案7

得分: 0

// ChunkStringSlice将[]string划分为chunkSize大小的块。
func ChunkStringSlice(s []string, chunkSize int) [][]string {
	chunkNum := int(math.Ceil(float64(len(s)) / float64(chunkSize)))
	res := make([][]string, 0, chunkNum)
	for i := 0; i < chunkNum-1; i++ {
		res = append(res, s[i*chunkSize:(i+1)*chunkSize])
	}
	res = append(res, s[(chunkNum-1)*chunkSize:])
	return res
}

// ChunkStringSlice2将[]string划分为chunkNum个块。
func ChunkStringSlice2(s []string, chunkNum int) [][]string {
	res := make([][]string, 0, chunkNum)
	chunkSize := int(math.Ceil(float64(len(s)) / float64(chunkNum)))
	for i := 0; i < chunkNum-1; i++ {
		res = append(res, s[i*chunkSize:(i+1)*chunkSize])
	}
	res = append(res, s[(chunkNum-1)*chunkSize:])
	return res
}
英文:

Summarize:

// ChunkStringSlice divides []string into chunks of chunkSize.
func ChunkStringSlice(s []string, chunkSize int) [][]string {
	chunkNum := int(math.Ceil(float64(len(s)) / float64(chunkSize)))
	res := make([][]string, 0, chunkNum)
	for i := 0; i &lt; chunkNum-1; i++ {
		res = append(res, s[i*chunkSize:(i+1)*chunkSize])
	}
	res = append(res, s[(chunkNum-1)*chunkSize:])
	return res
}

// ChunkStringSlice2 divides []string into chunkNum chunks.
func ChunkStringSlice2(s []string, chunkNum int) [][]string {
	res := make([][]string, 0, chunkNum)
	chunkSize := int(math.Ceil(float64(len(s)) / float64(chunkNum)))
	for i := 0; i &lt; chunkNum-1; i++ {
		res = append(res, s[i*chunkSize:(i+1)*chunkSize])
	}
	res = append(res, s[(chunkNum-1)*chunkSize:])
	return res
}

答案8

得分: 0

这是一个名为go-deeper/chunks的模块,它允许将任意类型的切片(使用泛型)分割成具有近似相等值总和的块。

package main

import (
	"fmt"

	"github.com/go-deeper/chunks"
)

func main() {
	slice := []int64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
	sliceChunks := chunks.Split(slice, 7)

	fmt.Println(sliceChunks)
}

输出结果:

[[1 2 3 4 5] [6 7 8 9 10]]
英文:

There is go-deeper/chunks module that allows to split a slice of any type (with generics) into chunks with approximately equals sum of values.

package main

import (
	&quot;fmt&quot;

	&quot;github.com/go-deeper/chunks&quot;
)

func main() {
	slice := []int64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
	sliceChunks := chunks.Split(slice, 7)

	fmt.Println(sliceChunks)
}

Output:

[[1 2 3 4 5] [6 7 8 9 10]]

huangapple
  • 本文由 发表于 2016年2月3日 22:23:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/35179656.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定