Go:从切片中删除多个条目的最快/最干净的方法是什么?

huangapple go评论167阅读模式
英文:

Go: What is the fastest/cleanest way to remove multiple entries from a slice?

问题

你好,以下是翻译好的部分:

如何在下面的代码中实现deleteRecords函数:

示例:

type Record struct {
id int
name string
}

type RecordList []*Record

func deleteRecords( l *RecordList, ids []int ) {
// 假设RecordList可能包含几百个条目。
// 要删除的记录数量大约为10个。
// 有什么最快和最简洁的方法来删除与记录列表中指定的id匹配的记录。
}

英文:

How would you implement the deleteRecords function in the code below:

Example:

type Record struct {
  id int
  name string
}

type RecordList []*Record

func deleteRecords( l *RecordList, ids []int ) {
   // Assume the RecordList can contain several 100 entries.
   // and the number of the of the records to be removed is about 10.
   // What is the fastest and cleanest ways to remove the records that match
   // the id specified in the records list.
}

答案1

得分: 20

我在我的机器上进行了一些微基准测试,尝试了这里回复中给出的大多数方法,当ids列表中的元素多达40个时,这段代码是最快的:

func deleteRecords(data []*Record, ids []int) []*Record {
	w := 0 // 写入索引

loop:
	for _, x := range data {
		for _, id := range ids {
			if id == x.id {
				continue loop
			}
		}
		data[w] = x
		w++
	}
	return data[:w]
}

您没有说明在列表中保留记录的顺序是否重要。如果不重要,那么这个函数比上面的函数更快,而且仍然相当简洁。

func reorder(data []*Record, ids []int) []*Record {
	n := len(data)
	i := 0
loop:
	for i < n {
		r := data[i]
		for _, id := range ids {
			if id == r.id {
				data[i] = data[n-1]
				n--
				continue loop
			}
		}
		i++
	}
	return data[0:n]
}

随着ids数量的增加,线性搜索的成本也会增加。当元素数量达到50个左右时,使用映射或进行二分搜索来查找id会更高效,只要您能避免每次重建映射(或重新排序列表)。当有几百个ids时,即使每次都必须重新构建映射,使用映射或二分搜索也更高效。

如果您希望保留切片的原始内容,可以使用类似以下的代码:

func deletePreserve(data []*Record, ids []int) []*Record {
	wdata := make([]*Record, len(data))
	w := 0
loop:
	for _, x := range data {
		for _, id := range ids {
			if id == x.id {
				continue loop
			}
		}
		wdata[w] = x
		w++
	}
	return wdata[0:w]
}
英文:

I did some micro-benchmarking on my machine, trying out most of the approaches given in the replies here, and this code comes out fastest when you've got up to about 40 elements in the ids list:

func deleteRecords(data []*Record, ids []int) []*Record {
	w := 0 // write index

loop:
	for _, x := range data {
		for _, id := range ids {
			if id == x.id {
				continue loop
			}
		}
		data[w] = x
		w++
	}
	return data[:w]
}

You didn't say whether it's important to preserve the order of records in the list. If you don't then this function is faster than the above and still fairly clean.

func reorder(data []*Record, ids []int) []*Record {
	n := len(data)
	i := 0
loop:
	for i &lt; n {
		r := data[i]
		for _, id := range ids {
			if id == r.id {
				data[i] = data[n-1]
				n--
				continue loop
			}
		}
		i++
	}
	return data[0:n]
}

As the number of ids rises, so does the cost of the linear search. At around 50 elements, using a map or doing a binary search to look up the id becomes more efficient, as long as you can avoid rebuilding the map (or resorting the list) every time. At several hundred ids, it becomes more efficient to use a map or a binary search even if you have to rebuild it every time.

If you wish to preserve original contents of the slice, something like this is more appropriate:

func deletePreserve(data []*Record, ids []int) []*Record {
	wdata := make([]*Record, len(data))
	w := 0
loop:
	for _, x := range data {
		for _, id := range ids {
			if id == x.id {
				continue loop
			}
		}
		wdata[w] = x
		w++
	}
	return wdata[0:w]
}

答案2

得分: 4

对于一个个人项目,我做了类似这样的事情:

func filter(sl []int, fn func(int) bool) []int {
    result := make([]int, 0, len(sl))
    last := 0
    for i, v := range sl {
        if fn(v) {
            result = append(result, sl[last:i]...)
            last = i + 1 
        }   
    }   
    return append(result, sl[last:]...)
}

它不会改变原始数据,但应该相对高效。
可能更好的做法是:

func filter(sl []int, fn func(int) bool) (result []int) {
    for _, v := range sl {
       if !fn(v) {
         result = append(result, v)
       }
    }
    return
}

更简单和清晰。
如果你想原地操作,可能需要这样做:

func filter(sl []int, fn func(int) bool) []int {
    outi := 0
    res := sl
    for _, v := range sl {
        if !fn(v) {
            res[outi] = v 
            outi++
        }   
    }   
    return res[0:outi]
}

你可以优化这个函数,使用copy来复制元素的范围,但这会使代码变得更长,可能不值得。

所以,在这种特定情况下,我可能会这样做:

func deleteRecords(l []*Record, ids []int) []*Record {
    outi := 0
L:
    for _, v := range l { 
        for _, id := range ids {
            if v.id == id {
                continue L
            }   
        }   
        l[outi] = v 
        outi++
    }   
    return l[0:outi]
}

(注意:未经测试。)

没有分配内存,没有花哨的东西,并且假设你提供的记录列表和id列表的大致大小,简单的线性搜索可能与更复杂的方法一样好,但没有任何开销。我意识到我的版本会改变切片并返回一个新的切片,但这在Go中并不是不合乎惯例的,它避免了强制调用点的切片在堆上分配内存。

英文:

For a personal project, I did something like this:

func filter(sl []int, fn func(int) bool) []int {
    result := make([]int, 0, len(sl))
    last := 0
    for i, v := range sl {
        if fn(v) {
            result = append(result, sl[last:i]...)
            last = i + 1 
        }   
    }   
    return append(result, sl[last:]...)
}

It doesn't mutate the original, but should be relatively efficient.
It's probably better to just do:

func filter(sl []int, fn func(int) bool) (result []int) {
    for _, v := range sl {
       if !fn(v) {
         result = append(result, v)
       }
    }
    return
}

Simpler and cleaner.
If you want to do it in-place, you probably want something like:

func filter(sl []int, fn func(int) bool) []int {
    outi := 0
    res := sl
    for _, v := range sl {
        if !fn(v) {
            res[outi] = v 
            outi++
        }   
    }   
    return res[0:outi]
}

You can optimize this to use <code>copy</code> to copy ranges of elements, but that's twice
the code and probably not worth it.

So, in this specific case, I'd probably do something like:

func deleteRecords(l []*Record, ids []int) []*Record {
    outi := 0
L:
    for _, v := range l { 
        for _, id := range ids {
            if v.id == id {
                continue L
            }   
        }   
        l[outi] = v 
        outi++
    }   
    return l[0:outi]
}

(Note: untested.)

No allocations, nothing fancy, and assuming the rough size of the list of Records and the list of ids you presented, a simple linear search is likely to do as well as fancier things but without any overhead. I realize that my version mutates the slice and returns a new slice, but that's not un-idiomatic in Go, and it avoids forcing the slice at the callsite to be heap allocated.

答案3

得分: 2

对于您描述的情况,其中len(ids)大约为10,len(*l)在几百个左右,这应该相对较快,因为它通过原地更新来最小化内存分配。

package main

import (
	"fmt"
	"strconv"
)

type Record struct {
	id   int
	name string
}

type RecordList []*Record

func deleteRecords(l *RecordList, ids []int) {
	rl := *l
	for i := 0; i < len(rl); i++ {
		rid := rl[i].id
		for j := 0; j < len(ids); j++ {
			if rid == ids[j] {
				copy(rl[i:len(*l)-1], rl[i+1:])
				rl[len(rl)-1] = nil
				rl = rl[:len(rl)-1]
				break
			}
		}
	}
	*l = rl
}

func main() {
	l := make(RecordList, 777)
	for i := range l {
		l[i] = &Record{int(i), "name #" + strconv.Itoa(i)}
	}
	ids := []int{0, 1, 2, 4, 8, len(l) - 1, len(l)}
	fmt.Println(ids, len(l), cap(l), *l[0], *l[1], *l[len(l)-1])
	deleteRecords(&l, ids)
	fmt.Println(ids, len(l), cap(l), *l[0], *l[1], *l[len(l)-1])
}

输出:

[0 1 2 4 8 776 777] 777 777 {0 name #0} {1 name #1} {776 name #776}
[0 1 2 4 8 776 777] 772 777 {1 name #1} {3 name #3} {775 name #775}
英文:

For the case you described, where len(ids) is approximately 10 and len(*l) is in the several hundreds, this should be relatively fast, since it minimizes memory allocations by updating in place.

package main

import (
	&quot;fmt&quot;
	&quot;strconv&quot;
)

type Record struct {
	id   int
	name string
}

type RecordList []*Record

func deleteRecords(l *RecordList, ids []int) {
	rl := *l
	for i := 0; i &lt; len(rl); i++ {
		rid := rl[i].id
		for j := 0; j &lt; len(ids); j++ {
			if rid == ids[j] {
				copy(rl[i:len(*l)-1], rl[i+1:])
				rl[len(rl)-1] = nil
				rl = rl[:len(rl)-1]
				break
			}
		}
	}
	*l = rl
}

func main() {
	l := make(RecordList, 777)
	for i := range l {
		l[i] = &amp;Record{int(i), &quot;name #&quot; + strconv.Itoa(i)}
	}
	ids := []int{0, 1, 2, 4, 8, len(l) - 1, len(l)}
	fmt.Println(ids, len(l), cap(l), *l[0], *l[1], *l[len(l)-1])
	deleteRecords(&amp;l, ids)
	fmt.Println(ids, len(l), cap(l), *l[0], *l[1], *l[len(l)-1])
}

Output:

[0 1 2 4 8 776 777] 777 777 {0 name #0} {1 name #1} {776 name #776}
[0 1 2 4 8 776 777] 772 777 {1 name #1} {3 name #3} {775 name #775}

答案4

得分: 2

不要重复搜索id,可以使用一个map。这段代码预先分配了map的完整大小,然后只是在原地移动数组元素。没有其他的分配。

func deleteRecords(l *RecordList, ids []int) {
	m := make(map[int]bool, len(ids))
	for _, id := range ids {
		m[id] = true
	}
	s, x := *l, 0
	for _, r := range s {
		if !m[r.id] {
			s[x] = r
			x++
		}
	}
	*l = s[0:x]
}
英文:

Instead of repeatedly searching ids, you could use a map. This code preallocates the full size of the map, and then just moves array elements in place. There are no other allocations.

func deleteRecords(l *RecordList, ids []int) {
	m := make(map[int]bool, len(ids))
	for _, id := range ids {
		m[id] = true
	}
	s, x := *l, 0
	for _, r := range s {
		if !m[r.id] {
			s[x] = r
			x++
		}
	}
	*l = s[0:x]
}

答案5

得分: 1

使用vector包的Delete方法作为指南,或者只需使用Vector而不是切片。

英文:

Use the vector package's Delete method as a guide, or just use a Vector instead of a slice.

答案6

得分: 0

这里有一个选项,但我希望有更简洁/更快/更具功能性的选择:

func deleteRecords(l *RecordList, ids []int) *RecordList {
    var newList RecordList
    for _, rec := range l {
        toRemove := false
        for _, id := range ids {
            if rec.id == id {
                toRemove = true
            }
        }
        if !toRemove {
            newList = append(newList, rec)
        }
    }
    return newList
}
英文:

Here is one option but I would hope there are cleaner/faster more functional looking ones:

func deleteRecords( l *RecordList, ids []int ) *RecordList {
    var newList RecordList
    for _, rec := range l {
        toRemove := false
        for _, id := range ids {
        if rec.id == id {
            toRemove = true
        }
        if !toRemove {
            newList = append(newList, rec)
        }
    }
    return newList
}

答案7

得分: 0

当l和ids足够大时,最好先对这两个列表进行排序,然后再进行一次循环,而不是使用两个嵌套循环。

英文:

With large enough l and ids it will be more effective to Sort() both lists first and then do a single loop over them instead of two nested loops

huangapple
  • 本文由 发表于 2011年2月17日 03:09:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/5020958.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定