GO:切片的唯一结构体有效可重用实现

huangapple go评论101阅读模式
英文:

GO: Slice of unique structs effective reusable implementation

问题

我经常需要根据任意的相等函数来去除重复项。
我需要实现以下功能:

  1. 快速且占用内存少(不创建映射)
  2. 可重用且易于使用,类似于slice.Sort()(github.com/bradfitz/slice)
  3. 不需要保持原始切片的顺序或保留原始切片
  4. 最好能最小化复制

这个功能可以在Go中实现吗?为什么我所知道的库中没有这个函数呢?
我查看了例如godash(github.com/zillow/godash)的实现,它使用了映射并且不允许任意的比较和相等。

以下是大致的实现方式:

测试:

import (
	"reflect"
	"testing"
)

type bla struct {
	ID string
}

type blas []bla

func (slice blas) Less(i, j int) bool {
	return slice[i].ID < slice[j].ID
}

func (slice blas) EqualID(i, j int) bool {
	return slice[i].ID == slice[j].ID
}

func Test_Unique(t *testing.T) {
	input := []bla{bla{ID: "d"}, bla{ID: "a"}, bla{ID: "b"}, bla{ID: "a"}, bla{ID: "c"}, bla{ID: "c"}}
	expected := []bla{bla{ID: "a"}, bla{ID: "b"}, bla{ID: "c"}, bla{ID: "d"}}
	Unique(input, blas(input).Less, blas(input).EqualID)
	if !reflect.DeepEqual(expected, input) {
		t.Errorf("2: Expected: %v but was %v \n", expected, input)
	}
}

我认为实现这个功能需要使用以下内容:

  • 仅使用切片作为数据结构,以保持简单和易于排序。
  • 一些反射 - 这对我来说是难点!因为我对Go还不熟悉。
英文:

I often need to get rid of duplicates based on arbitrary equals function.
I need implementation that:

  1. is fast and memory effective (does not create map)
  2. is reusable and easy to use, think of slice.Sort() (github.com/bradfitz/slice)
  3. it's not required to keep order of the original slice or preserve original slice
  4. would be nice to minimize copying

Can this be implemented in go? Why this function is not part of some library I am aware of?
I was looking e.g. godash (github.com/zillow/godash) implementation uses map and does not allow arbitrary less and equal.

Here is how it should approximately look like.
Test:

import (
	"reflect"
	"testing"
)

type bla struct {
	ID string
}

type blas []bla

func (slice blas) Less(i, j int) bool {
	return slice[i].ID < slice[j].ID
}

func (slice blas) EqualID(i, j int) bool {
	return slice[i].ID == slice[j].ID
}

func Test_Unique(t *testing.T) {
	input := []bla{bla{ID: "d"}, bla{ID: "a"}, bla{ID: "b"}, bla{ID: "a"}, bla{ID: "c"}, bla{ID: "c"}}
	expected := []bla{bla{ID: "a"}, bla{ID: "b"}, bla{ID: "c"}, bla{ID: "d"}}
	Unique(input, blas(input).Less, blas(input).EqualID)
	if !reflect.DeepEqual(expected, input) {
		t.Errorf("2: Expected: %v but was %v \n", expected, input)
	}
}

What I think will need to be used to implement this:

  • Only slices as data structure to keep it simple and for easy sorting.
  • Some reflection - the hard part for me! Since I am new to go.

答案1

得分: 2

选项

  • 您可以对切片进行排序并检查相邻节点 创建 = O(n logn),查找 = O(log n),插入 = O(n),删除 = O(n)
  • 您可以同时使用树和原始切片 创建 = O(n logn),查找 = O(log n),插入 = O(log n),删除 = O(log n)

在树的实现中,您可以将仅索引放入树节点中,并使用为接口定义的相等/小于函数对节点进行评估。

这是一个使用树的示例,这是播放链接1

您需要添加更多函数以使其可用,并且代码不友好缓存,因此您可以改进代码以使其缓存友好

如何使用

  1. 使表示切片的类型实现 Setter 接口
  2. set := NewSet(slice),创建一个切片
  3. 现在 set.T 只有唯一值的索引
  4. 为其他集合操作实现更多函数到 Set

代码

type Set struct {
	T Tree
    Slice Setter
}

func NewSet(slice Setter) *Set {
    set := new(Set)
    set.T = Tree{nil, 0, nil}
	set.Slice = slice
	for i:=0;i < slice.Len();i++ {
		insert(&set.T, slice, i)
	}
	return set
}

type Setter interface {
	Len() int
	At(int) (interface{},error)
	Less(int, int) bool
	Equal(int, int) bool
}


// A Tree is a binary tree with integer values.
type Tree struct {
	Left  *Tree
	Value int
	Right *Tree
}

func insert(t *Tree, Setter Setter, index int) *Tree {
	if t == nil {
		return &Tree{nil, index, nil}
	}
	if Setter.Equal(t.Value, index) {
		return t
	}

	if Setter.Less(t.Value, index) {
		t.Left = insert(t.Left, Setter, index)
		return t
	}
	t.Right = insert(t.Right, Setter, index)
	return t
}
英文:

Options

  • You can sort slice and check for adjacent nodes creation = O(n logn),lookup = O(log n) , insertion = O(n), deletion = O(n)
  • You can use a Tree and the original slice together creation = O(n logn),lookup = O(log n) , insertion = O(log n), deletion = O(log n)

In the tree implementation you may put only the index in tree nodes and evaluation of nodes will be done using the Equal/Less functions defined for the interface.

Here is an example with tree, here is the play link

You have to add more functions to make it usable ,and the code is not cache friendly so you may improve the code for make it cache friendly

How to use

  1. Make the type representing slice implement Setter interface
  2. set := NewSet(slice),creates a slice
  3. now set.T has only unique values indexes
  4. implement more functions to Set for other set operations

Code

type Set struct {
	T Tree
    Slice Setter
}

func NewSet(slice Setter) *Set {
    set := new(Set)
    set.T = Tree{nil, 0, nil}
	set.Slice = slice
	for i:=0;i < slice.Len();i++ {
		insert(&set.T, slice, i)
	}
	return set
}

type Setter interface {
	Len() int
	At(int) (interface{},error)
	Less(int, int) bool
	Equal(int, int) bool
}


// A Tree is a binary tree with integer values.
type Tree struct {
	Left  *Tree
	Value int
	Right *Tree
}

func insert(t *Tree, Setter Setter, index int) *Tree {
	if t == nil {
		return &Tree{nil, index, nil}
	}
	if Setter.Equal(t.Value, index) {
		return t
	}

	if Setter.Less(t.Value, index) {
		t.Left = insert(t.Left, Setter, index)
		return t
	}
	t.Right = insert(t.Right, Setter, index)
	return t
}

答案2

得分: 0

布隆过滤器经常用于等值测试。例如,有一个在GitHub上获得了一些星星的实现,可以在https://github.com/willf/bloom找到。这个特定的实现使用murmur3进行哈希处理,并使用位集作为过滤器,因此比映射更高效。

英文:

Bloom filter is frequently used for equality test. There is https://github.com/willf/bloom for example, which awarded some stars on github. This particular implementation uses murmur3 for hashing and bitset for filter, so can be more efficient than map.

huangapple
  • 本文由 发表于 2017年1月20日 19:10:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/41762268.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定