Golang中的Map集合

huangapple go评论86阅读模式
英文:

Map Sets in Golang

问题

如果我有一个结构体如下:

type Foo struct {
  title string
  Tags map[string]string
}

如何维护一组唯一的这样的结构体?据我所了解,尽管结构体可以比较相等性,但是映射(map)却不能。这意味着我不能比较上述结构体。因此,我不能只实现“将映射作为集合”的模式。

我能想到的两个可能可行的选项是:将Tags转换为排序后的[][]string,或者使用reflect.Deepequal。有没有更好的想法?

英文:

If I have a struct like:

type Foo struct {
  title string
  Tags map[string]string
}

How might approach maintaining a unique set of such structs? From what I understand, although struct equality is a thing - map equality isn't. This means I can't compare my above structs. Therefore I can't just implement the map as set pattern.

The two options that might work I can think of are: convert the Tags to a sorted [][]string or use reflect.Deepequal. Anyone have a better idea?

答案1

得分: 2

有几种实现这个的方法。James Henstridge实际上有一个很好的想法,我试图实现它。一开始只使用map的性能表现非常差,没有自己的哈希算法。

我解决这个问题的方法是保持一个结构体数组,并在插入时删除任何重复项。

package structset

type Foo struct {
  title string
  Tags  map[string]string
}

func (f Foo) Equals(f2 Foo) bool {
  if f.title != f2.title {
    return false
  }

  if len(f.Tags) != len(f2.Tags) {
    return false
  }

  for k, v := range f.Tags {
    if w, ok := f2.Tags[k]; !ok || v != w {
      return false
    }
  }

  return true
}

type FooSet []Foo

func (this FooSet) Add(value Foo) {
  if !this.Contains(value) {
    this = append(this, value)
  }
}

func (this FooSet) Length() int {
  return len(this)
}

func (this FooSet) Contains(f Foo) bool {
  for _, v := range this {
    if v.Equals(f) {
      return true
    }
  }
  return false
}

func NewSet() FooSet {
  return FooSet(make([]Foo, 0, 100))
}

我在我的i7-3770K Windows机器上进行了基准测试,结果如下:

BenchmarkSmallSetWithFewCollisions         50000             46615 ns/op
BenchmarkSmallSetWithMoreCollisions        50000             46575 ns/op
BenchmarkSmallSetWithManyCollisions        50000             46605 ns/op
BenchmarkMediumSetWithFewCollisions         1000           2335296 ns/op
BenchmarkMediumSetWithMoreCollisions        1000           2352298 ns/op
BenchmarkMediumSetWithManyCollisions        1000           2336796 ns/op
BenchmarkLargeSetWithFewCollisions            50          46805944 ns/op
BenchmarkLargeSetWithMoreCollisions           50          47376016 ns/op
BenchmarkLargeSetWithManyCollisions           50          46815946 ns/op

为了获得极少量的性能提升,你可以先将所有数据插入数组,然后再删除所有重复项。

删除重复项的代码如下:

func (this FooSet) RemoveDuplicates() {
  length := len(this) - 1
  for i := 0; i < length; i++ {
    for j := i + 1; j <= length; j++ {
      if this[i].Equals(this[j]) {
        this[j] = this[length]
        this = this[0:length]
        length--
        j--
      }
    }
  }
}

这个方法的基准测试结果如下:

BenchmarkSmallSetWithFewCollisions         50000             45245 ns/op
BenchmarkSmallSetWithMoreCollisions        50000             45615 ns/op
BenchmarkSmallSetWithManyCollisions        50000             45555 ns/op
BenchmarkMediumSetWithFewCollisions         1000           2294791 ns/op
BenchmarkMediumSetWithMoreCollisions        1000           2309293 ns/op
BenchmarkMediumSetWithManyCollisions        1000           2286290 ns/op
BenchmarkLargeSetWithFewCollisions            50          46235870 ns/op
BenchmarkLargeSetWithMoreCollisions           50          46515906 ns/op
BenchmarkLargeSetWithManyCollisions           50          45865824 ns/op

这是将Foo分配给map[string]Foo的基准测试结果。

BenchmarkSmallSetWithFewCollisions         50000             65718 ns/op
BenchmarkSmallSetWithMoreCollisions        50000             64238 ns/op
BenchmarkSmallSetWithManyCollisions        50000             55016 ns/op
BenchmarkMediumSetWithFewCollisions          500           3429435 ns/op
BenchmarkMediumSetWithMoreCollisions         500           3117395 ns/op
BenchmarkMediumSetWithManyCollisions        1000           2826858 ns/op
BenchmarkLargeSetWithFewCollisions            20          82635495 ns/op
BenchmarkLargeSetWithMoreCollisions           20          85285830 ns/op
BenchmarkLargeSetWithManyCollisions           20          73659350 ns/op

我觉得即使map是可哈希的,它的性能也不会很好。

英文:

There are a few ways of implementing this. James Henstridge actually had a good idea, and I attempted to implement it. It performed pretty poorly just to use map in the first place, without my own hash algorithm.

The way I solved this problem is just keep an array of your structs and then remove any duplicates as you insert them.

package structset

type Foo struct {
  title string
  Tags  map[string]string
}

func (f Foo) Equals(f2 Foo) bool {
  if f.title != f2.title {
    return false
  }

  if len(f.Tags) != len(f2.Tags) {
    return false
  }

  for k, v := range f.Tags {
    if w, ok := f2.Tags[k]; !ok || v != w {
      return false
    }
  }

  return true
}

type FooSet []Foo

func (this FooSet) Add(value Foo) {
  if !this.Contains(value) {
    this = append(this, value)
  }
}

func (this FooSet) Length() int {
  return len(this)
}

func (this FooSet) Contains(f Foo) bool {
  for _, v := range this {
    if v.Equals(f) {
      return true
    }
  }
  return false
}

func NewSet() FooSet {
  return FooSet(make([]Foo, 0, 100))
}

I benchmarked this on my i7-3770K Windows machine and got:

BenchmarkSmallSetWithFewCollisions         50000             46615 ns/op
BenchmarkSmallSetWithMoreCollisions        50000             46575 ns/op
BenchmarkSmallSetWithManyCollisions        50000             46605 ns/op
BenchmarkMediumSetWithFewCollisions         1000           2335296 ns/op
BenchmarkMediumSetWithMoreCollisions        1000           2352298 ns/op
BenchmarkMediumSetWithManyCollisions        1000           2336796 ns/op
BenchmarkLargeSetWithFewCollisions            50          46805944 ns/op
BenchmarkLargeSetWithMoreCollisions           50          47376016 ns/op
BenchmarkLargeSetWithManyCollisions           50          46815946 ns/op

To eek out a very small amount of performance, you can insert all your data into the array first, and then remove all duplicates after.

The remove duplicates code is:

func (this FooSet) RemoveDuplicates() {
  length := len(this) - 1
  for i := 0; i &lt; length; i++ {
    for j := i + 1; j &lt;= length; j++ {
      if this[i].Equals(this[j]) {
        this[j] = this[length]
        this = this[0:length]
        length--
        j--
      }
    }
  }
}

The benchmarks for this is:

BenchmarkSmallSetWithFewCollisions         50000             45245 ns/op
BenchmarkSmallSetWithMoreCollisions        50000             45615 ns/op
BenchmarkSmallSetWithManyCollisions        50000             45555 ns/op
BenchmarkMediumSetWithFewCollisions         1000           2294791 ns/op
BenchmarkMediumSetWithMoreCollisions        1000           2309293 ns/op
BenchmarkMediumSetWithManyCollisions        1000           2286290 ns/op
BenchmarkLargeSetWithFewCollisions            50          46235870 ns/op
BenchmarkLargeSetWithMoreCollisions           50          46515906 ns/op
BenchmarkLargeSetWithManyCollisions           50          45865824 ns/op

Here is the benchmark of just assigning Foo to a map[string]Foo.

BenchmarkSmallSetWithFewCollisions         50000             65718 ns/op
BenchmarkSmallSetWithMoreCollisions        50000             64238 ns/op
BenchmarkSmallSetWithManyCollisions        50000             55016 ns/op
BenchmarkMediumSetWithFewCollisions          500           3429435 ns/op
BenchmarkMediumSetWithMoreCollisions         500           3117395 ns/op
BenchmarkMediumSetWithManyCollisions        1000           2826858 ns/op
BenchmarkLargeSetWithFewCollisions            20          82635495 ns/op
BenchmarkLargeSetWithMoreCollisions           20          85285830 ns/op
BenchmarkLargeSetWithManyCollisions           20          73659350 ns/op

It seems to me even if a map was hashable, it still wouldn't perform very well.

答案2

得分: 1

根据你的操作,一种选择是将结构体存储为映射中的值,而不是键。为了使其工作,你需要创建一种方法来从每个结构体值生成唯一的键。

可以尝试以下方式:

// 不一定非得是字符串:只需要适合用作映射键的类型即可。
func (foo *Foo) key() string {
    return key_string
}

fooSet := make(map[string]*Foo)

// 存储一个Foo
fooSet[x.key()] = x

// 检查x是否在集合中:
if fooSet[x.key()] != nil {
    println("x在集合中")
}

这种方法的效果取决于你能够为结构体派生一个键的效率有多高。

英文:

Depending on what you are doing, one option might be to store the structs as the values in the map rather than keys. For this to work, you will need to create some way to generate a unique key from each struct value though.

Something like this might work:

// Doesn&#39;t have to be a string: just has to be suitable for use as a map key.
func (foo *Foo) key() string {
    return key_string
}

fooSet := make(map[string] *Foo)

// Store a Foo
fooSet[x.key()] = x

// Check if x is in the set:
if fooSet[x.key()] != nil {
    println(&quot;x is in the set&quot;)
}

How well this works will depend on how efficient you can derive a key for your struct.

答案3

得分: 0

你确定你的示例代码能够正常工作吗?
我认为你需要将指向Add()方法的指针传递给你的代码才能使其正常工作。无论如何,这是我的实现:

package math

// 类型

type IntPoint struct {
    X, Y int
}

// 适用于少量项目的集合实现
type IntPointSet struct {
    slice []IntPoint 
}

// 函数

func (p1 IntPoint) Equals(p2 IntPoint) bool {
    return (p1.X == p2.X) && (p1.Y == p2.Y)
}

func (set *IntPointSet) Add(p IntPoint) {
    if !set.Contains(p) {
        set.slice = append(set.slice, p)
    }
}

func (set IntPointSet) Contains(p IntPoint) bool {
    for _, v := range set.slice {
        if v.Equals(p) {
            return true
        }
    }
    return false
}

func (set IntPointSet) NumElements() int {
    return len(set.slice)
}

func NewIntPointSet() IntPointSet {
    return IntPointSet{(make([]IntPoint, 0, 10))}
}

希望对你有帮助!

英文:

Are you sure your example works?
I believe you have to pass a pointer to the Add() method for your code to work. Anyways, here is my implementation:

package math

// types

type IntPoint struct {
    X, Y int
}

// set implementation for small number of items
type IntPointSet struct {
    slice []IntPoint 
}

// functions

func (p1 IntPoint) Equals(p2 IntPoint) bool {
    return (p1.X == p2.X) &amp;&amp; (p1.Y == p2.Y)
}

func (set *IntPointSet) Add(p IntPoint) {
   	if ! set.Contains(p) {
    	set.slice = append(set.slice, p)
	}
}

func (set IntPointSet) Contains(p IntPoint) bool {
  for _, v := range set.slice {
    if v.Equals(p) {
      return true
    }
  }
  return false
}

func (set IntPointSet) NumElements() int {
    return len(set.slice)
}

func NewIntPointSet() IntPointSet {
  return IntPointSet{(make([]IntPoint, 0, 10))}
}

huangapple
  • 本文由 发表于 2013年12月20日 04:57:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/20691540.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定