英文:
How to use a dedicated channel to signal the end of a crawl job in go
问题
这是对我的上一个问题的跟进。
我正在尝试构建一个网络爬虫的原型,并且我想使用chan来阻塞执行,直到所有的任务都完成,就像这样:
func main() {
    go func() {
        do_stuff()
        stop <- true
    }
    fmt.Println(<-stop)
}
有一个queue函数将任务分发给工作线程。当所有任务完成时,该函数还会关闭通道并发送一个信号。
type Job int
// 模拟处理 HTML 页面并返回更多链接的工作线程
func worker(in chan Job, out chan Job, num int) {
    for element := range in {
        if element%2 == 0 {
            out <- 100*element + 5
            out <- 100*element + 3
            out <- 100*element + 1
        }
    }
}
func queue(toWorkers chan<- Job, fromWorkers <-chan Job, init Job, stop chan bool) {
    var list []Job
    var currentJobs int
    currentJobs = 0
    list = append(list, init)
    done := make(map[Job]bool)
    for {
        var send chan<- Job
        var item Job
        if len(list) > 0 {
            send = toWorkers
            item = list[0]
        } else if currentJobs == 0 {
            close(toWorkers)
            // 这里出了问题!
            stop <- true
            return
        }
        select {
        case send <- item:
            currentJobs += 1
            // 我们发送了一个任务,将其移除
            list = list[1:]
        case thing := <-fromWorkers:
            currentJobs -= 1
            // 收到一个新的任务
            if !done[thing] {
                list = append(list, thing)
                done[thing] = true
            }
        }
    }
}
func main() {
    in := make(chan Job, 1)
    out := make(chan Job, 1)
    stop := make(chan bool)
    // 将任务分发给工作线程
    go queue(in, out, 0, stop)
    for i := 0; i < max_workers; i++ {
        go worker(in, out, i)
    }
    duration := time.Second
    time.Sleep(duration)
    // 这会导致死锁
    fmt.Println(<-stop)
}
如果我理解正确,问题出在stop通道上:当工作线程仍然有任务时,Go 会认为没有人会向该通道发送消息,并宣布死锁。queue函数将关闭toWorkers通道并发送一个信号到stop,但前提是没有未完成的任务。
我漏掉了什么?
英文:
This is a follow up from my previous question.
I am trying to build a prototype for a webcrawler and I want to use a chan to block the execution until all the jobs are done, just as in
func main() {
	go func() {
	    do_stuff()
	    stop <- true
   }
	fmt.Println(<-stop)
}
There is a queue function that dispatch the jobs to the workers. When all jobs are finished, the function will also the channel and send a signal.
type Job int
//simulating a worker that processes a html page and returns some more links
func worker(in chan Job, out chan Job, num int) {
	for element := range in {
		if element%2 == 0 {
			out <- 100*element + 5
			out <- 100*element + 3
			out <- 100*element + 1
		}
	}
}
func queue(toWorkers chan<- Job, fromWorkers <-chan Job, init Job, stop chan bool) {
	var list []Job
	var currentJobs int
	currentJobs = 0
	list = append(list, init)
	done := make(map[Job]bool)
	for {
		var send chan<- Job
		var item Job
		if len(list) > 0 {
			send = toWorkers
			item = list[0]
		} else if currentJobs == 0 {
			close(toWorkers)
			// this messes up everything!
			stop <- true
			return
		}
		select {
		case send <- item:
			currentJobs += 1
			// We sent an item, remove it
			list = list[1:]
		case thing := <-fromWorkers:
			currentJobs -= 1
			// Got a new thing
			if !done[thing] {
				list = append(list, thing)
				done[thing] = true
			}
		}
	}
}
func main() {
	in := make(chan Job, 1)
	out := make(chan Job, 1)
	stop := make(chan bool)
	// dispatches jobs to workers
	go queue(in, out, 0, stop)
	for i := 0; i < max_workers; i++ {
		go worker(in, out, i)
	}
	duration := time.Second
	time.Sleep(duration)
	// this cause deadlock
	fmt.Println(<-stop)
}
If I understand correctly, the problem is with the stop channel: when the workers still have jobs, go thinks that no one will send to that channel and declares deadlock. The function queue will both close the toWorkers channel and send a signal to stop, but not while there are outstanding jobs.
What am I missing?
答案1
得分: 4
使用sync.WaitGroup来等待所有的goroutine结束。
http://golang.org/pkg/sync/#WaitGroup
http://blog.golang.org/pipelines
我在这里提供了一个小例子:http://play.golang.org/p/P30LdV0Gfe
package main
import (
	"fmt"
	"sync"
)
func main() {
	var wg sync.WaitGroup
	routinesNo := 10
	wg.Add(routinesNo)
	for i := 0; i < routinesNo; i++ {
		go func(n int) {
			fmt.Printf("%d ", n)
			wg.Done()
		}(i)
	}
	wg.Wait()
	fmt.Println("\nThe end!")
}
英文:
Use sync.WaitGroup to wait for all the go routines to end.
http://golang.org/pkg/sync/#WaitGroup
http://blog.golang.org/pipelines
I made a small example here: http://play.golang.org/p/P30LdV0Gfe
package main
import (
	"fmt"
	"sync"
)
func main() {
	var wg sync.WaitGroup
	routinesNo := 10
	wg.Add(routinesNo)
	for i := 0; i < routinesNo; i++ {
		go func(n int) {
			fmt.Printf("%d ", n)
			wg.Done()
		}(i)
	}
	wg.Wait()
	fmt.Println("\nThe end!")
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论