在Golang中实现递归函数的生成器(yield)的惯用方式是什么?

huangapple go评论102阅读模式
英文:

The idiomatic way to implement generators (yield) in Golang for recursive functions

问题

[注意:我阅读了https://stackoverflow.com/q/11385556/142239,这不是它的重复。]

在Python / Ruby / JavaScript / ECMAScript 6中,可以使用语言提供的yield关键字编写生成器函数。在Go中,可以使用goroutine和通道来模拟。

代码

以下代码展示了如何实现一个排列函数(abcd,abdc,acbd,acdb,...,dcba):

// $src/lib/lib.go

package lib

// private, starts with lowercase "p"
func permutateWithChannel(channel chan<- []string, strings, prefix []string) {
    length := len(strings)
    if length == 0 {
        // Base case
        channel <- prefix
        return
    }
    // Recursive case
    newStrings := make([]string, 0, length-1)
    for i, s := range strings {
        // Remove strings[i] and assign the result to newStringI
        // Append strings[i] to newPrefixI
        // Call the recursive case
        newStringsI := append(newStrings, strings[:i]...)
        newStringsI = append(newStringsI, strings[i+1:]...)
        newPrefixI := append(prefix, s)
        permutateWithChannel(channel, newStringsI, newPrefixI)
    }
}

// public, starts with uppercase "P"
func PermutateWithChannel(strings []string) chan []string {
    channel := make(chan []string)
    prefix := make([]string, 0, len(strings))
    go func() {
        permutateWithChannel(channel, strings, prefix)
        close(channel)
    }()
    return channel
}

以下是如何使用它的示例:

// $src/main.go

package main

import (
    "./lib"
    "fmt"
)

var (
    fruits  = []string{"apple", "banana", "cherry", "durian"}
    banned = "durian"
)

func main() {
    channel := lib.PermutateWithChannel(fruits)
    for myFruits := range channel {
        fmt.Println(myFruits)
        if myFruits[0] == banned {
            close(channel)
            //break
        }
    }
}

注意:

上面被注释的break语句是不需要的,因为close(channel)会导致range在下一次迭代时返回false,循环将终止。

问题

如果调用者不需要所有的排列,它需要显式地close()通道,否则通道将一直保持打开状态,直到程序终止(资源泄漏)。另一方面,如果调用者需要所有的排列(即range循环直到结束),调用者不应该close()通道。这是因为对已经关闭的通道进行close()操作会导致运行时恐慌(参见规范中的这里)。然而,如果用于确定是否应该停止的逻辑不像上面所示那么简单,我认为最好使用defer close(channel)

问题

  1. 实现这样的生成器的惯用方式是什么?
  2. 根据惯例,应该由库函数还是调用者负责close()通道?
  3. 修改我的代码如下是否是一个好主意,以便无论如何,调用者都负责defer close()通道?

在库中,将这个代码修改为:

go permutateWithChannel(channel, strings, prefix)

在调用者中,将这个代码修改为:

func main() {
    channel := lib.PermutateWithChannel(fruits)
    defer close(channel)    // <- Added
    for myFruits := range channel {
        fmt.Println(myFruits)
        if myFruits[0] == banned {
            break           // <- Changed
        }
    }
}
  1. 尽管通过执行上述代码无法观察到,且算法的正确性不受影响,但在调用者close()通道之后,运行库代码的goroutine在下一次迭代中尝试向已关闭的通道发送数据时应该会引发panic,如规范中所述,导致它终止。这会产生任何负面影响吗?
  2. 库函数的签名是func(strings []string) chan []string。理想情况下,返回类型应该是<-chan []string以限制其为只接收通道。然而,如果调用者负责close()通道,它不能被标记为“只接收”,因为close()内置函数无法在只接收通道上工作。处理这种情况的惯用方式是什么?
英文:

[ Note: I read https://stackoverflow.com/q/11385556/142239, this is not a duplicate of it. ]

In Python / Ruby / JavaScript / ECMAScript 6, generator functions could be written using the yield keyword provided by the language. In Go, it could be simulated using a goroutine and a channel.

The Code

The following code shows how a permutation function (abcd, abdc, acbd, acdb, ..., dcba) could be implemented:

<!-- language: lang-golang -->

// $src/lib/lib.go

package lib

// private, starts with lowercase &quot;p&quot;
func permutateWithChannel(channel chan&lt;- []string, strings, prefix []string) {
	length := len(strings)
	if length == 0 {
		// Base case
		channel &lt;- prefix
		return
	}
	// Recursive case
	newStrings := make([]string, 0, length-1)
	for i, s := range strings {
		// Remove strings[i] and assign the result to newStringI
		// Append strings[i] to newPrefixI
		// Call the recursive case
		newStringsI := append(newStrings, strings[:i]...)
		newStringsI = append(newStringsI, strings[i+1:]...)
		newPrefixI := append(prefix, s)
		permutateWithChannel(channel, newStringsI, newPrefixI)
	}
}

// public, starts with uppercase &quot;P&quot;
func PermutateWithChannel(strings []string) chan []string {
	channel := make(chan []string)
	prefix := make([]string, 0, len(strings))
	go func() {
		permutateWithChannel(channel, strings, prefix)
		close(channel)
	}()
	return channel
}

Here is how it could be used:

<!-- language: lang-golang -->

// $src/main.go

package main

import (
	&quot;./lib&quot;
	&quot;fmt&quot;
)

var (
	fruits  = []string{&quot;apple&quot;, &quot;banana&quot;, &quot;cherry&quot;, &quot;durian&quot;}
	banned = &quot;durian&quot;
)

func main() {
	channel := lib.PermutateWithChannel(fruits)
	for myFruits := range channel {
		fmt.Println(myFruits)
		if myFruits[0] == banned {
			close(channel)
			//break
		}
	}
}

Note:

The break statement (commented above) is not needed, as close(channel) causes range to return false in the next iteration, the loop will terminate.

The Problem

If the caller doesn't need all the permutations, it needs to close() the channel explicitly, or the channel will not be closed until the program terminates (resource leakage occurs). On the other hand, if the caller needs all the permutations (i.e. the range loops until the end), the caller MUST NOT close() the channel. It is because close()-ing an already closed channel causes a runtime panic (see here in the spec). However, if the logic to determine whether it should stop or not is not as simple as shown above, I think it is better to use defer close(channel).

The Questions

  1. What is the idiomatic way to implement generators like this?
  2. Idiomatically, who should be responsible to close() the channel - the library function or the caller?
  3. Is it a good idea to modify my code like below, so that the caller is responsible to defer close() the channel no matter what?

In the library, modify this:

<!-- language: lang-golang -->

	go func() {
		permutateWithChannel(channel, strings, prefix)
		close(channel)
	}()

to this:

<!-- language: lang-golang -->

	go permutateWithChannel(channel, strings, prefix)

In the caller, modify this:

<!-- language: lang-golang -->

func main() {
	channel := lib.PermutateWithChannel(fruits)
	for myFruits := range channel {
		fmt.Println(myFruits)
		if myFruits[0] == banned {
			close(channel)
		}
	}
}

to this:

<!-- language: lang-golang -->

func main() {
	channel := lib.PermutateWithChannel(fruits)
	defer close(channel)    // &lt;- Added
	for myFruits := range channel {
		fmt.Println(myFruits)
		if myFruits[0] == banned {
			break           // &lt;- Changed
		}
	}
}
  1. Despite it is not observable by executing the code above, and the correctness of the algorithm is not affected, after the caller close()s the channel, the goroutine running the library code should panic when it tries to send to the closed channel in the next iteration, as documented here in the spec, causing it to terminate. Does this cause any negative side-effect?
  2. The signature of the library function is func(strings []string) chan []string. Ideally, the return type should be &lt;-chan []string to restrict it to be receive-only. However, if it is the caller who is responsible to close() the channel, it could not be marked as "receive-only", as the close() built-in function does not work on receive-only channels. What is the idiomatic way to deal with this?

答案1

得分: 39

I. 替代方案

**前言:**我将使用一个更简单的生成器,因为问题不涉及生成器的复杂性,而是生成器和消费者之间的信号以及消费者本身的调用。这个简单的生成器只生成从09的整数。

1. 使用函数值

通过传递一个简单的“消费者”函数,生成-消费模式更加清晰,这样做的优点是它可以返回一个值,用于表示是否需要中止或执行其他操作。

在这个例子中,只有一个事件需要被信号化("abort"),因此消费者函数将具有bool返回类型,表示是否需要中止。

下面是一个将消费者函数值传递给生成器的简单示例:

func generate(process func(x int) bool) {
    for i := 0; i < 10; i++ {
        if process(i) {
            break
        }
    }
}

func main() {
    process := func(x int) bool {
        fmt.Println("Processing", x)
        return x == 3 // 如果 x == 3 则终止
    }
    generate(process)
}

输出结果(在Go Playground上尝试):

Processing 0
Processing 1
Processing 2
Processing 3

注意,消费者(process)不需要是一个“局部”函数,它可以在main()之外声明,例如可以是一个全局函数或来自另一个包的函数。

这种解决方案的潜在缺点是它只使用了一个goroutine来生成和消费值。

2. 使用通道

如果你仍然想使用通道来实现,也是可以的。请注意,由于通道是由生成器创建的,并且由消费者循环接收来自通道的值(理想情况下使用for ... range结构),关闭通道是生成器的责任。通过这种方式,你还可以返回一个只接收的通道。

是的,最好将在生成器中返回的通道的关闭操作放在一个延迟语句中,这样即使生成器发生panic,消费者也不会被阻塞。但是请注意,这个延迟的关闭操作不是在generate()函数中,而是在从generate()开始并作为一个新的goroutine执行的匿名函数中;否则,通道将在从generate()返回之前关闭,这样做没有任何用处...

如果你想要从消费者向生成器发送信号(例如中止并且不再生成更多的值),你可以使用另一个通道,该通道被传递给生成器。由于生成器只会“监听”这个通道,因此它也可以被声明为只接收的通道。如果你只需要信号化一个事件(在我们的例子中是中止),则不需要在这个通道上发送任何值,只需简单地关闭它即可。如果你需要信号化多个事件,可以通过在这个通道上实际发送一个值来实现,该值表示要执行的事件/操作(其中中止可能是多个事件中的一个)。

你可以使用select语句作为处理在返回的通道上发送值和监视传递给生成器的通道的惯用方式。

下面是一个带有abort通道的解决方案:

func generate(abort <-chan struct{}) <-chan int {
    ch := make(chan int)
    go func() {
        defer close(ch)
        for i := 0; i < 10; i++ {
            select {
            case ch <- i:
                fmt.Println("Sent", i)
            case <-abort: // 在关闭的通道上接收可以立即进行
                fmt.Println("Aborting")
                return
            }
        }
    }()
    return ch
}

func main() {
    abort := make(chan struct{})
    ch := generate(abort)
    for v := range ch {
        fmt.Println("Processing", v)
        if v == 3 { // 如果 v == 3 则终止
            close(abort)
            break
        }
    }
    // Sleep to prevent termination so we see if other goroutine panics
    time.Sleep(time.Second)
}

输出结果(在Go Playground上尝试):

Sent 0
Processing 0
Processing 1
Sent 1
Sent 2
Processing 2
Processing 3
Sent 3
Aborting

这种解决方案的明显优势是它已经使用了2个goroutine(一个用于生成值,一个用于消费/处理值),并且非常容易将其扩展为使用任意数量的goroutine处理生成的值,因为生成器返回的通道可以从多个goroutine并发使用 - 通道在设计上是安全的,可以同时接收,不会发生数据竞争;更多信息请阅读:https://stackoverflow.com/questions/34039229/if-i-am-using-channels-properly-should-i-need-to-use-mutexes

II. 未解答问题的回答

在一个goroutine上的“未捕获”panic将结束该goroutine的执行,但不会导致资源泄漏的问题。但是,如果在非panic情况下,作为单独的goroutine执行的函数会释放它分配的资源(在非延迟语句中),那么这段代码显然不会运行,并且可能导致资源泄漏。

你之前没有观察到这一点,是因为当主goroutine终止时程序终止(它不会等待其他非主goroutine完成 - 因此你的其他goroutine没有机会发生panic)。参见规范:程序执行

但要知道,panic()recover()是用于异常情况的,它们不适用于像Java中的异常和try-catch块这样的一般用例。应该避免使用panic,而是通过返回错误(并处理错误!)来处理异常情况,而且panic绝对不应该离开包的“边界”(例如,panic()recover()可能在包的实现中使用是有道理的,但是panic状态应该在包内部“捕获”,而不是让其传递到包外部)。

英文:

I. Alternatives

Foreword: I will use a much simpler generator, because the problem does not concern the generator complexity but rather the signals between the generator and consumer, and the call of the consumer itself. This simple generator just generates the integer numbers from 0 to 9.

1. With a function value

A generate-consumer pattern is much cleaner with a simple consumer function passed, which also has the advantage that it can return a value signalling if abortion or any other action is required.

And since in the example only one event is to be signaled ("abort"), the consumer function will have bool return type, signalling if abort is required.

So see this simple example with a consumer function value passed to the generator:

func generate(process func(x int) bool) {
	for i := 0; i &lt; 10; i++ {
		if process(i) {
			break
		}
	}
}

func main() {
	process := func(x int) bool {
		fmt.Println(&quot;Processing&quot;, x)
		return x == 3 // Terminate if x == 3
	}
	generate(process)
}

Output (try it on the Go Playground):

Processing 0
Processing 1
Processing 2
Processing 3

Note that the consumer (process) does not need to be a "local" function, it can be declared outside of main(), e.g. it can be a global function or a function from another package.

The potential downside of this solution is that it uses only 1 goroutine both for generating and consuming values.

2. With channels

If you still want to do it with channels, you can. Note that since the channel is created by the generator, and since the consumer loops over the values received from the channel (ideally with a for ... range construct), it is the generator's responsibility to close the channel. Settling with this also allows you to return a receive-only channel.

And yes, closing the returned channel in the generator is best done as a deferred statement, so even if the generator panics, the consumer will not get blocked. But note that this deferred close is not in the generate() function but in the anonymous function started from generate() and executed as a new goroutine; else the channel would be closed before it is returned from generate() - not useful at all...

And if you want to signal the generator from the consumer (e.g. to abort and not generate further values), you can use e.g. another channel, which is passed to the generator. Since the generator will only "listen" to this channel, it can also be declared as a receive-only channel to the generator. If you only need to signal one event (abort in our case), no need to send any values on this channel, a simple close will do it. If you need to signal multiple events, it can be done by actually sending a value on this channel, the event / action to be carried out (where abort may be one from multiple events).

And you can use the select statement as the idiomatic way to handle sending values on the returned channel and watching the channel passed to the generator.

Here is a solution with an abort channel:

func generate(abort &lt;-chan struct{}) &lt;-chan int {
	ch := make(chan int)
	go func() {
		defer close(ch)
		for i := 0; i &lt; 10; i++ {
			select {
			case ch &lt;- i:
				fmt.Println(&quot;Sent&quot;, i)
			case &lt;-abort: // receive on closed channel can proceed immediately
				fmt.Println(&quot;Aborting&quot;)
				return
			}
		}
	}()
	return ch
}

func main() {
	abort := make(chan struct{})
	ch := generate(abort)
	for v := range ch {
		fmt.Println(&quot;Processing&quot;, v)
		if v == 3 { // Terminate if v == 3
			close(abort)
			break
		}
	}
	// Sleep to prevent termination so we see if other goroutine panics
	time.Sleep(time.Second)
}

Output (try it on the Go Playground):

Sent 0
Processing 0
Processing 1
Sent 1
Sent 2
Processing 2
Processing 3
Sent 3
Aborting

The obvious advantage of this solution is that it already uses 2 goroutines (1 that generates values, 1 that consumes/processes them), and it is very easy to extend it to process the generated values with any number of goroutines as the channel returned by the generator can be used from multiple goroutines concurrently - channels are safe to be receiving from concurrently, data races cannot occur, by design; for more read: https://stackoverflow.com/questions/34039229/if-i-am-using-channels-properly-should-i-need-to-use-mutexes

II. Answers to unaddressed questions

An "uncaught" panic on a goroutine will end the execution of the goroutine but will not cause a problem in regards to resource leak. But if the function executed as a separate goroutine would free resources (in non-deferred statements) allocated by it in case of non-panic, that code will obviously not run and will cause resource leak for example.

You haven't observed this because the program terminates when the main goroutine terminates (and it does not wait for other non-main goroutines to finish - so your other goroutines did not get a chance to panic). See Spec: Program execution.

But know that panic() and recover() are for exceptional cases, they are not intended for such general use cases like the Exceptions and try-catch blocks in Java. Panics should be avoided, by returning errors (and handling them!) for example, and panics should definitely not leave the "borders" of packages (e.g. panic() and recover() may be justified to be used in a package implementation, but panicking state should be "caught" inside the package and not let out of it).

答案2

得分: 9

在我看来,生成器通常只是内部封装闭包的包装器。类似这样的代码:

package main

import "fmt"

// 这个 `generator` 函数返回另一个函数,我们在 `generator` 的函数体中匿名定义这个函数。
// 返回的函数通过闭包引用了变量 `data`。
func generator(data int, permutation func(int) int, bound int) func() (int, bool) {
    return func() (int, bool) {
        data = permutation(data)
        return data, data < bound
    }
}

// 排列函数
func increment(j int) int {
    j += 1
    return j
}

func main() {
    // 我们调用 `generator` 函数,并将结果(一个函数)赋值给 `next`。
    // 这个函数值捕获了它自己的 `data` 值,每次调用 `next` 时都会更新。
    next := generator(1, increment, 7)
    // 通过多次调用 `next` 来观察闭包的效果。
    fmt.Println(next())
    fmt.Println(next())
    fmt.Println(next())
    // 为了确认状态是特定函数的唯一状态,创建并测试一个新的函数。
    for next, generation, ok := generator(11, increment, 17), 0, true; ok; {
        generation, ok = next()
        fmt.Println(generation)
    }
}

在我看来,这段代码虽然不像for range那样优雅,但在语义和语法上非常清晰。而且它是有效的。你可以在这里运行它:http://play.golang.org/p/fz8xs0RYz9

英文:

To my mind usually generators are just wrappers around closure internally. Something like this

<!-- language: lang-golang -->

package main

import &quot;fmt&quot;

// This function `generator` returns another function, which
// we define anonymously in the body of `generator`. The
// returned function _closes over_ the variable `data` to
// form a closure.
func generator(data int, permutation func(int) int, bound int) func() (int, bool) {
	return func() (int, bool) {
		data = permutation(data)
		return data, data &lt; bound
	}
}

// permutation function
func increment(j int) int {
	j += 1
	return j
}

func main() {
	// We call `generator`, assigning the result (a function)
	// to `next`. This function value captures its
	// own `data` value, which will be updated each time
	// we call `next`.
	next := generator(1, increment, 7)
	// See the effect of the closure by calling `next`
	// a few times.
	fmt.Println(next())
	fmt.Println(next())
	fmt.Println(next())
	// To confirm that the state is unique to that
	// particular function, create and test a new one.
	for next, generation, ok := generator(11, increment, 17), 0, true; ok; {
		generation, ok = next()
		fmt.Println(generation)
	}
}

It looks not as elegant as 'for range' but quite clear semantically and syntactically for me. And it works http://play.golang.org/p/fz8xs0RYz9

答案3

得分: 3

我同意icza的答案。总结起来,有两种选择:

  1. 映射函数:使用回调函数来迭代一个集合。func myIterationFn(yield func (myType)) (stopIterating bool)。这种方法的缺点是将控制流程交给了myGenerator函数。myIterationFn不是一个符合Python风格的生成器,因为它没有返回可迭代的序列。
  2. 通道:使用通道,并注意泄漏goroutine的问题。可以将myIterationFn转换为一个返回可迭代序列的函数。以下代码提供了这种转换的示例。
myMapper := func(yield func(int) bool) {
    for i := 0; i < 5; i++ {
        if done := yield(i); done {
            return
        }
    }
}
iter, cancel := mapperToIterator(myMapper)
defer cancel() // 这一行非常重要,它可以防止goroutine泄漏。
for value, ok := iter(); ok; value, ok = iter() {
    fmt.Printf("value: %d\n", value)
}

这是一个完整的示例程序。mapperToIterator函数将映射函数转换为生成器。由于Go语言缺乏泛型,需要将interface{}转换为int

package main

import "fmt"

// yieldFn在集合的每个成员上调用,如果迭代应该继续,则返回true。
type yieldFn func(interface{}) (stopIterating bool)

// mapperFn对集合的每个成员调用yieldFn。
type mapperFn func(yieldFn)

// iteratorFn返回迭代中的下一个项或零值。第二个返回值为true表示迭代完成。
type iteratorFn func() (value interface{}, done bool)

// cancelFn应该被调用以清理否则会泄漏的goroutine。
type cancelFn func()

// mapperToIterator将mappingFn转换为iteratorFn版本。必须在迭代结束时调用第二个返回值,否则底层的goroutine将泄漏。
func mapperToIterator(m mapperFn) (iteratorFn, cancelFn) {
    generatedValues := make(chan interface{}, 1)
    stopCh := make(chan interface{}, 1)
    go func() {
        m(func(obj interface{}) bool {
            select {
            case <-stopCh:
                return false
            case generatedValues <- obj:
                return true
            }
        })
        close(generatedValues)
    }()
    iter := func() (value interface{}, notDone bool) {
        value, notDone = <-generatedValues
        return
    }
    return iter, func() {
        stopCh <- nil
    }
}

func main() {
    myMapper := func(yield yieldFn) {
        for i := 0; i < 5; i++ {
            if keepGoing := yield(i); !keepGoing {
                return
            }
        }
    }
    iter, cancel := mapperToIterator(myMapper)
    defer cancel()
    for value, notDone := iter(); notDone; value, notDone = iter() {
        fmt.Printf("value: %d\n", value.(int))
    }
}
英文:

I agree with icza's answer. To summarize, there are are two alternatives:

  1. mapping function: Use a callback to iterate over a collection. func myIterationFn(yield func (myType)) (stopIterating bool). This has the disadvantage of ceding the control flow to myGenerator function. myIterationFn is not a Pythonic generator because it doesn't return an iterable sequence.
  2. channels: Use a channel and be wary of leaking goroutines. It's possible to transform myIterationFn into a function that returns an iterable sequence. The following code provides an example of such a transformation.

<!-- language: lang-golang -->

myMapper := func(yield func(int) bool) {
for i := 0; i &lt; 5; i++ {
if done := yield(i); done {
return
}
}
}
iter, cancel := mapperToIterator(myMapper)
defer cancel() // This line is very important - it prevents goroutine leaks.
for value, ok := iter(); ok; value, ok = iter() {
fmt.Printf(&quot;value: %d\n&quot;, value)
}

Here's a complete program as an example. mapperToIterator does the transformation from a mapping function to a generator. Go's lack of generics requires casting from interface{} to int.

<!-- language: lang-golang -->

package main
import &quot;fmt&quot;
// yieldFn reports true if an iteration should continue. It is called on values
// of a collection.
type yieldFn func(interface{}) (stopIterating bool)
// mapperFn calls yieldFn for each member of a collection.
type mapperFn func(yieldFn)
// iteratorFn returns the next item in an iteration or the zero value. The
// second return value is true when iteration is complete.
type iteratorFn func() (value interface{}, done bool)
// cancelFn should be called to clean up the goroutine that would otherwise leak.
type cancelFn func()
// mapperToIterator returns an iteratorFn version of a mappingFn. The second
// return value must be called at the end of iteration, or the underlying
// goroutine will leak.
func mapperToIterator(m mapperFn) (iteratorFn, cancelFn) {
generatedValues := make(chan interface{}, 1)
stopCh := make(chan interface{}, 1)
go func() {
m(func(obj interface{}) bool {
select {
case &lt;-stopCh:
return false
case generatedValues &lt;- obj:
return true
}
})
close(generatedValues)
}()
iter := func() (value interface{}, notDone bool) {
value, notDone = &lt;-generatedValues
return
}
return iter, func() {
stopCh &lt;- nil
}
}
func main() {
myMapper := func(yield yieldFn) {
for i := 0; i &lt; 5; i++ {
if keepGoing := yield(i); !keepGoing {
return
}
}
}
iter, cancel := mapperToIterator(myMapper)
defer cancel()
for value, notDone := iter(); notDone; value, notDone = iter() {
fmt.Printf(&quot;value: %d\n&quot;, value.(int))
}
}

huangapple
  • 本文由 发表于 2015年12月25日 23:38:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/34464146.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定