英文:
How to use a dedicated channel to signal the end of a crawl job in go
问题
这是对我的上一个问题的跟进。
我正在尝试构建一个网络爬虫的原型,并且我想使用chan
来阻塞执行,直到所有的任务都完成,就像这样:
func main() {
go func() {
do_stuff()
stop <- true
}
fmt.Println(<-stop)
}
有一个queue
函数将任务分发给工作线程。当所有任务完成时,该函数还会关闭通道并发送一个信号。
type Job int
// 模拟处理 HTML 页面并返回更多链接的工作线程
func worker(in chan Job, out chan Job, num int) {
for element := range in {
if element%2 == 0 {
out <- 100*element + 5
out <- 100*element + 3
out <- 100*element + 1
}
}
}
func queue(toWorkers chan<- Job, fromWorkers <-chan Job, init Job, stop chan bool) {
var list []Job
var currentJobs int
currentJobs = 0
list = append(list, init)
done := make(map[Job]bool)
for {
var send chan<- Job
var item Job
if len(list) > 0 {
send = toWorkers
item = list[0]
} else if currentJobs == 0 {
close(toWorkers)
// 这里出了问题!
stop <- true
return
}
select {
case send <- item:
currentJobs += 1
// 我们发送了一个任务,将其移除
list = list[1:]
case thing := <-fromWorkers:
currentJobs -= 1
// 收到一个新的任务
if !done[thing] {
list = append(list, thing)
done[thing] = true
}
}
}
}
func main() {
in := make(chan Job, 1)
out := make(chan Job, 1)
stop := make(chan bool)
// 将任务分发给工作线程
go queue(in, out, 0, stop)
for i := 0; i < max_workers; i++ {
go worker(in, out, i)
}
duration := time.Second
time.Sleep(duration)
// 这会导致死锁
fmt.Println(<-stop)
}
如果我理解正确,问题出在stop
通道上:当工作线程仍然有任务时,Go 会认为没有人会向该通道发送消息,并宣布死锁。queue
函数将关闭toWorkers
通道并发送一个信号到stop
,但前提是没有未完成的任务。
我漏掉了什么?
英文:
This is a follow up from my previous question.
I am trying to build a prototype for a webcrawler and I want to use a chan
to block the execution until all the jobs are done, just as in
func main() {
go func() {
do_stuff()
stop <- true
}
fmt.Println(<-stop)
}
There is a queue
function that dispatch the jobs to the workers. When all jobs are finished, the function will also the channel and send a signal.
type Job int
//simulating a worker that processes a html page and returns some more links
func worker(in chan Job, out chan Job, num int) {
for element := range in {
if element%2 == 0 {
out <- 100*element + 5
out <- 100*element + 3
out <- 100*element + 1
}
}
}
func queue(toWorkers chan<- Job, fromWorkers <-chan Job, init Job, stop chan bool) {
var list []Job
var currentJobs int
currentJobs = 0
list = append(list, init)
done := make(map[Job]bool)
for {
var send chan<- Job
var item Job
if len(list) > 0 {
send = toWorkers
item = list[0]
} else if currentJobs == 0 {
close(toWorkers)
// this messes up everything!
stop <- true
return
}
select {
case send <- item:
currentJobs += 1
// We sent an item, remove it
list = list[1:]
case thing := <-fromWorkers:
currentJobs -= 1
// Got a new thing
if !done[thing] {
list = append(list, thing)
done[thing] = true
}
}
}
}
func main() {
in := make(chan Job, 1)
out := make(chan Job, 1)
stop := make(chan bool)
// dispatches jobs to workers
go queue(in, out, 0, stop)
for i := 0; i < max_workers; i++ {
go worker(in, out, i)
}
duration := time.Second
time.Sleep(duration)
// this cause deadlock
fmt.Println(<-stop)
}
If I understand correctly, the problem is with the stop
channel: when the workers still have jobs, go thinks that no one will send to that channel and declares deadlock. The function queue
will both close the toWorkers
channel and send a signal to stop
, but not while there are outstanding jobs.
What am I missing?
答案1
得分: 4
使用sync.WaitGroup来等待所有的goroutine结束。
http://golang.org/pkg/sync/#WaitGroup
http://blog.golang.org/pipelines
我在这里提供了一个小例子:http://play.golang.org/p/P30LdV0Gfe
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
routinesNo := 10
wg.Add(routinesNo)
for i := 0; i < routinesNo; i++ {
go func(n int) {
fmt.Printf("%d ", n)
wg.Done()
}(i)
}
wg.Wait()
fmt.Println("\nThe end!")
}
英文:
Use sync.WaitGroup to wait for all the go routines to end.
http://golang.org/pkg/sync/#WaitGroup
http://blog.golang.org/pipelines
I made a small example here: http://play.golang.org/p/P30LdV0Gfe
package main
import (
"fmt"
"sync"
)
func main() {
var wg sync.WaitGroup
routinesNo := 10
wg.Add(routinesNo)
for i := 0; i < routinesNo; i++ {
go func(n int) {
fmt.Printf("%d ", n)
wg.Done()
}(i)
}
wg.Wait()
fmt.Println("\nThe end!")
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论