英文:
GO: Slice of unique structs effective reusable implementation
问题
我经常需要根据任意的相等函数来去除重复项。
我需要实现以下功能:
- 快速且占用内存少(不创建映射)
- 可重用且易于使用,类似于slice.Sort()(github.com/bradfitz/slice)
- 不需要保持原始切片的顺序或保留原始切片
- 最好能最小化复制
这个功能可以在Go中实现吗?为什么我所知道的库中没有这个函数呢?
我查看了例如godash(github.com/zillow/godash)的实现,它使用了映射并且不允许任意的比较和相等。
以下是大致的实现方式:
测试:
import (
"reflect"
"testing"
)
type bla struct {
ID string
}
type blas []bla
func (slice blas) Less(i, j int) bool {
return slice[i].ID < slice[j].ID
}
func (slice blas) EqualID(i, j int) bool {
return slice[i].ID == slice[j].ID
}
func Test_Unique(t *testing.T) {
input := []bla{bla{ID: "d"}, bla{ID: "a"}, bla{ID: "b"}, bla{ID: "a"}, bla{ID: "c"}, bla{ID: "c"}}
expected := []bla{bla{ID: "a"}, bla{ID: "b"}, bla{ID: "c"}, bla{ID: "d"}}
Unique(input, blas(input).Less, blas(input).EqualID)
if !reflect.DeepEqual(expected, input) {
t.Errorf("2: Expected: %v but was %v \n", expected, input)
}
}
我认为实现这个功能需要使用以下内容:
- 仅使用切片作为数据结构,以保持简单和易于排序。
- 一些反射 - 这对我来说是难点!因为我对Go还不熟悉。
英文:
I often need to get rid of duplicates based on arbitrary equals function.
I need implementation that:
- is fast and memory effective (does not create map)
- is reusable and easy to use, think of slice.Sort() (github.com/bradfitz/slice)
- it's not required to keep order of the original slice or preserve original slice
- would be nice to minimize copying
Can this be implemented in go? Why this function is not part of some library I am aware of?
I was looking e.g. godash (github.com/zillow/godash) implementation uses map and does not allow arbitrary less and equal.
Here is how it should approximately look like.
Test:
import (
"reflect"
"testing"
)
type bla struct {
ID string
}
type blas []bla
func (slice blas) Less(i, j int) bool {
return slice[i].ID < slice[j].ID
}
func (slice blas) EqualID(i, j int) bool {
return slice[i].ID == slice[j].ID
}
func Test_Unique(t *testing.T) {
input := []bla{bla{ID: "d"}, bla{ID: "a"}, bla{ID: "b"}, bla{ID: "a"}, bla{ID: "c"}, bla{ID: "c"}}
expected := []bla{bla{ID: "a"}, bla{ID: "b"}, bla{ID: "c"}, bla{ID: "d"}}
Unique(input, blas(input).Less, blas(input).EqualID)
if !reflect.DeepEqual(expected, input) {
t.Errorf("2: Expected: %v but was %v \n", expected, input)
}
}
What I think will need to be used to implement this:
- Only slices as data structure to keep it simple and for easy sorting.
- Some reflection - the hard part for me! Since I am new to go.
答案1
得分: 2
选项
- 您可以对切片进行排序并检查相邻节点 创建 = O(n logn),查找 = O(log n),插入 = O(n),删除 = O(n)
- 您可以同时使用树和原始切片 创建 = O(n logn),查找 = O(log n),插入 = O(log n),删除 = O(log n)
在树的实现中,您可以将仅索引放入树节点中,并使用为接口定义的相等/小于函数对节点进行评估。
这是一个使用树的示例,这是播放链接1
您需要添加更多函数以使其可用,并且代码不友好缓存,因此您可以改进代码以使其缓存友好
如何使用
- 使表示切片的类型实现 Setter 接口
- set := NewSet(slice),创建一个切片
- 现在 set.T 只有唯一值的索引
- 为其他集合操作实现更多函数到
Set
代码
type Set struct {
T Tree
Slice Setter
}
func NewSet(slice Setter) *Set {
set := new(Set)
set.T = Tree{nil, 0, nil}
set.Slice = slice
for i:=0;i < slice.Len();i++ {
insert(&set.T, slice, i)
}
return set
}
type Setter interface {
Len() int
At(int) (interface{},error)
Less(int, int) bool
Equal(int, int) bool
}
// A Tree is a binary tree with integer values.
type Tree struct {
Left *Tree
Value int
Right *Tree
}
func insert(t *Tree, Setter Setter, index int) *Tree {
if t == nil {
return &Tree{nil, index, nil}
}
if Setter.Equal(t.Value, index) {
return t
}
if Setter.Less(t.Value, index) {
t.Left = insert(t.Left, Setter, index)
return t
}
t.Right = insert(t.Right, Setter, index)
return t
}
英文:
Options
- You can sort slice and check for adjacent nodes creation = O(n logn),lookup = O(log n) , insertion = O(n), deletion = O(n)
- You can use a Tree and the original slice together creation = O(n logn),lookup = O(log n) , insertion = O(log n), deletion = O(log n)
In the tree implementation you may put only the index in tree nodes and evaluation of nodes will be done using the Equal/Less functions defined for the interface.
Here is an example with tree, here is the play link
You have to add more functions to make it usable ,and the code is not cache friendly so you may improve the code for make it cache friendly
How to use
- Make the type representing slice implement Setter interface
- set := NewSet(slice),creates a slice
- now set.T has only unique values indexes
- implement more functions to
Set
for other set operations
Code
type Set struct {
T Tree
Slice Setter
}
func NewSet(slice Setter) *Set {
set := new(Set)
set.T = Tree{nil, 0, nil}
set.Slice = slice
for i:=0;i < slice.Len();i++ {
insert(&set.T, slice, i)
}
return set
}
type Setter interface {
Len() int
At(int) (interface{},error)
Less(int, int) bool
Equal(int, int) bool
}
// A Tree is a binary tree with integer values.
type Tree struct {
Left *Tree
Value int
Right *Tree
}
func insert(t *Tree, Setter Setter, index int) *Tree {
if t == nil {
return &Tree{nil, index, nil}
}
if Setter.Equal(t.Value, index) {
return t
}
if Setter.Less(t.Value, index) {
t.Left = insert(t.Left, Setter, index)
return t
}
t.Right = insert(t.Right, Setter, index)
return t
}
答案2
得分: 0
布隆过滤器经常用于等值测试。例如,有一个在GitHub上获得了一些星星的实现,可以在https://github.com/willf/bloom找到。这个特定的实现使用murmur3进行哈希处理,并使用位集作为过滤器,因此比映射更高效。
英文:
Bloom filter is frequently used for equality test. There is https://github.com/willf/bloom for example, which awarded some stars on github. This particular implementation uses murmur3 for hashing and bitset for filter, so can be more efficient than map.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论