在Go语言中,是否可以迭代自定义类型?

huangapple go评论89阅读模式
英文:

In Go is it possible to iterate over a custom type?

问题

我有一个自定义类型,内部有一个数据切片。

通过实现一些函数或接口,使得可以使用 range 迭代(使用 range)我的自定义类型,这种可能性存在吗?

英文:

I have a custom type which internally has a slice of data.

Is it possible, by implementing some functions or an interface that the range operator needs, to iterate (using range) over my custom type?

答案1

得分: 68

简短的回答是不可以。

长的回答仍然是不可以,但是可以通过一种方式来实现类似的效果。但是需要明确的是,这绝对是一种hack的方法。

有几种方式可以实现,但它们的共同点是你需要以某种方式将你的数据转换为Go能够遍历的类型。

方法一:切片

由于你提到你内部有一个切片,这可能是你的用例最简单的方法。思路很简单:你的类型应该有一个Iterate()方法(或类似的方法),其返回值是适当类型的切片。当调用该方法时,将创建一个包含数据结构中所有元素的新切片,以你希望迭代的任何顺序。例如:

func (m *MyType) Iterate() []MyElementType { ... }

mm := NewMyType()
for i, v := range mm.Iterate() {
    ...
}

这里有几个问题需要考虑。首先是分配内存的问题 - 除非你想公开对内部数据的引用(一般来说,你可能不想这样做),否则你必须创建一个新的切片并将所有元素复制过去。从大O的角度来看,这并不是很糟糕(你无论如何都要线性地遍历所有元素),但从实际目的来看,可能会有影响。

此外,这种方法不能处理正在变化的数据的迭代。大多数情况下这可能不是问题,但如果你真的想支持并发更新和某些类型的迭代语义,你可能会关心。

方法二:通道

通道也是可以在Go中进行遍历的一种方式。思路是让你的Iterate()方法生成一个goroutine,该goroutine将遍历你的数据结构中的元素,并将它们写入一个通道。然后,当迭代完成时,可以关闭通道,这将导致循环结束。例如:

func (m *MyType) Iterate() <-chan MyElementType {
    c := make(chan MyElementType)
    go func() {
        for _, v := range m.elements {
            c <- v
        }
        close(c)
    }()
    return c
}

mm := NewMyType()
for v := range mm.Iterate() {
    ...
}

这种方法相对于切片方法有两个优点:首先,你不需要分配线性数量的内存(尽管出于性能原因,你可能希望让通道有一些缓冲区),其次,如果你喜欢并发更新,你的迭代器可以很好地与之配合。

这种方法的一个很大的缺点是,如果不小心,你可能会泄漏goroutine。唯一的解决方法是让你的通道具有足够深的缓冲区,以容纳你的数据结构中的所有元素,这样goroutine就可以填充它并返回,即使没有从通道中读取任何元素(通道稍后可以被垃圾回收)。问题在于,a) 你现在又回到了线性分配,b) 你必须事先知道要写入多少个元素,这在某种程度上阻止了整个并发更新的过程。

故事的寓意是,通道在迭代时很方便,但你可能不想真正使用它们。

方法三:内部迭代器

感谢hobbs在我之前就提到了这个方法,但为了完整起见(并且因为我想多说一点),我将在这里介绍它。

这里的思路是创建一个迭代器对象(或者只支持一次迭代的对象,并直接在其上进行迭代),就像在更直接支持此功能的语言中一样。然后,你调用Next()方法,它会a) 将迭代器推进到下一个元素,并b) 返回一个布尔值,指示是否还有剩余元素。然后你需要一个单独的Get()方法来获取当前元素的值。这种用法实际上并不使用range关键字,但它看起来非常自然:

mm := MyNewType()
for mm.Next() {
    v := mm.Get()
    ...
}

这种技术相对于前两种方法有几个优点。首先,它不需要事先分配内存。其次,它非常自然地支持错误处理。虽然它不是真正的迭代器,但这正是bufio.Scanner所做的。基本上,这个想法是有一个Error()方法,你在迭代完成后调用它,以查看迭代是因为完成还是因为在中途遇到错误而终止。对于纯内存数据结构,这可能无关紧要,但对于涉及IO的数据结构(例如,遍历文件系统树,迭代数据库查询结果等),这非常好用。因此,完成上面的代码片段:

mm := MyNewType()
for mm.Next() {
    v := mm.Get()
    ...
}
if err := mm.Error(); err != nil {
    ...
}

结论

Go不支持对任意数据结构或自定义迭代器进行遍历,但你可以通过hack的方式实现。如果你必须在生产代码中这样做,第三种方法绝对是最好的选择,因为它既最清晰,又最不像hack(毕竟,标准库中包含了这种模式)。

英文:

The short answer is no.

The long answer is still no, but it's possible to hack it in a way that it sort of works. But to be clear, this is most certainly a hack.

There are a few ways you can do it, but the common theme between them is that you want to somehow transform your data into a type that Go is capable of ranging over.

Approach 1: Slices

Since you mentioned that you have a slice internally, this may be easiest for your use case. The idea is simple: your type should have an Iterate() method (or similar) whose return value is a slice of the appropriate type. When called, a new slice is created containing all of the elements of the data structure in whatever order you'd like them to be iterated over. So, for example:

func (m *MyType) Iterate() []MyElementType { ... }

mm := NewMyType()
for i, v := range mm.Iterate() {
    ...
}

There are a few concerns here. First, allocation - unless you want to expose references to internal data (which, in general, you probably don't), you have to make a new slice and copy all of the elements over. From a big-O standpoint, this isn't that bad (you're doing a linear amount of work iterating over everything anyway), but for practical purposes, it may matter.

Additionally, this doesn't handle iterating over mutating data. This is probably not an issue most of the time, but if you really want to support concurrent updates and certain types of iteration semantics, you might care.

Approach 2: Channels

Channels are also something that can be ranged over in Go. The idea is to have your Iterate() method spawn a goroutine that will iterate over the elements in your data structure, and write them to a channel. Then, when the iteration is done, the channel can be closed, which will cause the loop to finish. For example:

func (m *MyType) Iterate() &lt;-chan MyElementType {
    c := make(chan MyElementType)
    go func() {
        for _, v := range m.elements {
            c &lt;- v
        }
        close(c)
    }()
    return c
}

mm := NewMyType()
for v := range mm.Iterate() {
    ...
}

There are two advantages of this method over the slice method: first, you don't have to allocate a linear amount of memory (although you may want to make your channel have a bit of a buffer for performance reasons), and second, you can have your iterator play nicely with concurrent updates if you're into that sort of thing.

The big downside of this approach is that, if you're not careful, you can leak goroutines. The only way around this is to make your channel have a buffer deep enough to hold all of the elements in your data structure so that the goroutine can fill it and then return even if no elements are read from the channel (and the channel can then later be garbage collected). The problem here is that, a) you're now back to linear allocation and, b) you have to know up-front how many elements you're going to write, which sort of puts a stop to the whole concurrent-updates thing.

The moral of the story is that channels are cute for iterating, but you probably don't want to actually use them.

Approach 3: Internal Iterators

Credit to hobbs for getting to this before me, but I'll cover it here for completeness (and because I want to say a bit more about it).

The idea here is to create an iterator object of sorts (or to just have your object only support one iterator at a time, and iterate on it directly), just like you would in languages that support this more directly. What you do, then, is call a Next() method which, a) advances the iterator to the next element and, b) returns a boolean indicating whether or not there's anything left. Then you need a separate Get() method to actually get the value of the current element. The usage of this doesn't actually use the range keyword, but it looks pretty natural nonetheless:

mm := MyNewType()
for mm.Next() {
    v := mm.Get()
    ...
}

There are a few advantages of this technique over the previous two. First, it doesn't involve allocating memory up-front. Second, it supports errors very naturally. While it's not really an iterator, this is exactly what bufio.Scanner does. Basically the idea is to have an Error() method which you call after iteration is complete to see whether iteration terminated because it was done, or because an error was encountered midway through. For purely in-memory data structures this may not matter, but for ones that involve IO (e.g., walking a filesystem tree, iterating over database query results, etc), it's really nice. So, to complete the code snippet above:

mm := MyNewType()
for mm.Next() {
    v := mm.Get()
    ...
}
if err := mm.Error(); err != nil {
    ...
}

Conclusion

Go doesn't support ranging over arbitrary data structures - or custom iterators - but you can hack it. If you have to do this in production code, the third approach is 100% the way to go, as it is both the cleanest and the least of a hack (after all, the standard library includes this pattern).

答案2

得分: 18

不,不使用rangerange只接受数组、切片、字符串、映射和通道。

通常可迭代对象的惯用方式(例如bufio.Scanner)是这样的:

iter := NewIterator(...)
for iter.More() {
    item := iter.Item()
    // 对item进行操作
}

但是没有通用的接口(考虑到类型系统,这也不是很有用),实现该模式的不同类型通常对于MoreItem方法有不同的命名(例如bufio.ScannerScanText方法)。

英文:

No, not using range. range accepts arrays, slices, strings, maps, and channels, and that's it.

The usual sort of idiom for iterable things (for example a bufio.Scanner) seems to be

iter := NewIterator(...)
for iter.More() {
    item := iter.Item()
    // do something with item
}

but there's no universal interface (wouldn't be very useful given the type system anyway) and different types that implement the pattern generally have different names for their More and Item methods (for example Scan and Text for a bufio.Scanner)

答案3

得分: 11

joshlf给出了一个很好的答案,但我想补充一些内容:

使用通道

使用通道迭代器的一个常见问题是,你必须遍历整个数据结构,否则提供通道的goroutine将永远挂起。但是这个问题可以很容易地避免,下面是一种方法:

func (s intSlice) chanIter() chan int {
    c := make(chan int)
    go func() {
        for _, i := range s {
            select {
            case c <- i:
            case <-c:
                close(c)
                return
            }
        }
        close(c)
    }()
    return c
}

在这种情况下,向迭代器通道写回会提前中断迭代:

s := intSlice{1, 2, 3, 4, 5, 11, 22, 33, 44, 55}
c := s.chanIter()
for i := range c {
    fmt.Println(i)
    if i > 30 {
        // 发送到c以中断
        c <- 0
    }
}

在这里非常重要的一点是,你不能简单地在for循环中使用break语句。你可以使用break,但必须在写入通道之前先写入通道,以确保goroutine会退出。

使用闭包

我经常倾向于使用迭代闭包的一种迭代方法。在这种情况下,迭代器是一个函数值,当重复调用时,它返回下一个元素并指示迭代是否可以继续:

func (s intSlice) cloIter() func() (int, bool) {
    i := -1
    return func() (int, bool) {
        i++
        if i == len(s) {
            return 0, false
        }
        return s[i], true
    }
}

像这样使用它:

iter := s.cloIter()
for i, ok := iter(); ok; i, ok = iter() {
    fmt.Println(i)
}

在这种情况下,提前跳出循环是完全可以的,iter最终会被垃圾回收。

Playground

这是上述实现的链接:http://play.golang.org/p/JC2EpBDQKA

英文:

joshlf gave an excellent answer, but I'd like to add a couple of things:

Using channels

A typical problem with channel iterators is that you have to range through the entire data structure or the goroutine feeding the channel will be left hanging forever. But this can be quite easily circumvented, here's one way:

func (s intSlice) chanIter() chan int {
	c := make(chan int)
	go func() {
		for _, i := range s {
			select {
			case c &lt;- i:
			case &lt;-c:
				close(c)
				return
			}
		}
		close(c)
	}()
	return c
}

In this case writing back to the iterator channel interrupts the iteration early:

s := intSlice{1, 2, 3, 4, 5, 11, 22, 33, 44, 55}
c := s.chanIter()
for i := range c {
	fmt.Println(i)
	if i &gt; 30 {
		// Send to c to interrupt
		c &lt;- 0
	}
}

Here it is very important that you don't simply break out of the for loop. You can break, but you must must write to the channel first to ensure the goroutine will exit.

Using closures

A method of iteration I often tend to favour is to use an iterator closure. In this case the iterator is a function value, which, when called repeatedly, returns the next element and indicates whether the iteration can continue:

func (s intSlice) cloIter() func() (int, bool) {
	i := -1
	return func() (int, bool) {
		i++
		if i == len(s) {
			return 0, false
		}
		return s[i], true
	}
}

Use it like this:

iter := s.cloIter()
for i, ok := iter(); ok; i, ok = iter() {
    fmt.Println(i)
}

In this case it's perfectly ok to break out of the loop early, iter will eventually be garbage collected.

Playground

Here's the link to the implementations above: http://play.golang.org/p/JC2EpBDQKA

答案4

得分: 6

有另一种未提及的选项。

您可以定义一个名为**Iter(fn func(int))**的函数,该函数接受一个函数作为参数,该函数将在自定义类型的每个项目上被调用。

type MyType struct {
    data []int
}

func (m *MyType) Iter(fn func(int)) {
    for _, item := range m.data {
        fn(item)
    }
}

可以像这样使用:

d := MyType{
    data: []int{1,2,3,4,5},
}

f := func(i int) {
    fmt.Println(i)
}
d.Iter(f)

Playground
可运行实现的链接:https://play.golang.org/p/S3CTQmGXj79

英文:

There is another option that wasn't mentioned.

You can define a Iter(fn func(int)) function which accepts some function that will be called for each item in your custom type.

type MyType struct {
    data []int
}

func (m *MyType) Iter(fn func(int)) {
    for _, item := range m.data {
        fn(item)
    }
}

And it can be used like this:

d := MyType{
    data: []int{1,2,3,4,5},
}

f := func(i int) {
    fmt.Println(i)
}
d.Iter(f)

Playground
Link to working implementation: https://play.golang.org/p/S3CTQmGXj79

huangapple
  • 本文由 发表于 2016年3月5日 13:57:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/35810674.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定