英文:
Why is my code causing a stall or race condition?
问题
由于某种原因,一旦我通过goroutine中的通道添加字符串,当我运行代码时,代码就会停止运行。我以为这是一个作用域/闭包问题,所以将所有代码直接移入函数中,但没有效果。我查阅了Golang的文档,所有的示例看起来都和我的代码类似,所以我对出错的原因感到困惑。
func getPage(url string, c chan<- string, swg sizedwaitgroup.SizedWaitGroup) {
defer swg.Done()
doc, err := goquery.NewDocument(url)
if err != nil{
fmt.Println(err)
}
nodes := doc.Find(".v-card .info")
for i := range nodes.Nodes {
el := nodes.Eq(i)
var name string
if el.Find("h3.n span").Size() != 0{
name = el.Find("h3.n span").Text()
}else if el.Find("h3.n").Size() != 0{
name = el.Find("h3.n").Text()
}
address := el.Find(".adr").Text()
phoneNumber := el.Find(".phone.primary").Text()
website, _ := el.Find(".track-visit-website").Attr("href")
//c <- map[string] string{"name":name,"address":address,"Phone Number": phoneNumber,"website": website,};
c <- fmt.Sprintf("%s%s%s%s",name,address,phoneNumber,website)
fmt.Println([]string{name,address,phoneNumber,website,})
}
}
func getNumPages(url string) int{
doc, err := goquery.NewDocument(url)
if err != nil{
fmt.Println(err);
}
pagination := strings.Split(doc.Find(".pagination p").Contents().Eq(1).Text()," ")
numItems, _ := strconv.Atoi(pagination[len(pagination)-1])
return int(math.Ceil(float64(numItems)/30))
}
func main() {
arrChan := make(chan string)
swg := sizedwaitgroup.New(8)
zips := []string{"78705","78710","78715"}
for _, item := range zips{
swg.Add()
go getPage(fmt.Sprintf(base_url,item,1),arrChan,swg)
}
swg.Wait()
}
编辑:
我通过将sizedwaitgroup作为引用传递来修复了问题,但是当我移除缓冲区时,它就不起作用了,这是否意味着我需要预先知道将要发送到通道的元素数量?
英文:
For some reason, once I started adding strings through a channel in my goroutine, the code stalls when I run it. I thought that it was a scope/closure issue so I moved all code directly into the function to no avail. I have looked through Golang's documentation and all examples look similar to mine so I am kind of clueless as to what is going wrong.
func getPage(url string, c chan<- string, swg sizedwaitgroup.SizedWaitGroup) {
defer swg.Done()
doc, err := goquery.NewDocument(url)
if err != nil{
fmt.Println(err)
}
nodes := doc.Find(".v-card .info")
for i := range nodes.Nodes {
el := nodes.Eq(i)
var name string
if el.Find("h3.n span").Size() != 0{
name = el.Find("h3.n span").Text()
}else if el.Find("h3.n").Size() != 0{
name = el.Find("h3.n").Text()
}
address := el.Find(".adr").Text()
phoneNumber := el.Find(".phone.primary").Text()
website, _ := el.Find(".track-visit-website").Attr("href")
//c <- map[string] string{"name":name,"address":address,"Phone Number": phoneNumber,"website": website,};
c <- fmt.Sprint("%s%s%s%s",name,address,phoneNumber,website)
fmt.Println([]string{name,address,phoneNumber,website,})
}
}
func getNumPages(url string) int{
doc, err := goquery.NewDocument(url)
if err != nil{
fmt.Println(err);
}
pagination := strings.Split(doc.Find(".pagination p").Contents().Eq(1).Text()," ")
numItems, _ := strconv.Atoi(pagination[len(pagination)-1])
return int(math.Ceil(float64(numItems)/30))
}
func main() {
arrChan := make(chan string)
swg := sizedwaitgroup.New(8)
zips := []string{"78705","78710","78715"}
for _, item := range zips{
swg.Add()
go getPage(fmt.Sprintf(base_url,item,1),arrChan,swg)
}
swg.Wait()
}
Edit:
so I fixed it by passing sizedwaitgroup as a reference but when I remove the buffer it doesn't work does that mean that I need to know how many elements will be sent to the channel in advance?
答案1
得分: 5
#问题
根据Colin Stewart的回答,根据你发布的代码,据我所知,你的问题实际上是与读取arrChan
有关。你在其中写入数据,但在代码中没有地方读取它。
根据文档:
> 如果通道是无缓冲的,则发送方会阻塞,直到接收方接收到该值。如果通道有缓冲区,则发送方仅在将值复制到缓冲区之后阻塞;如果缓冲区已满,这意味着等待直到某个接收方检索到一个值。
通过使通道具有缓冲区,你的代码不再在通道写入操作上阻塞,看起来像这一行:
c <- fmt.Sprint("%s%s%s%s",name,address,phoneNumber,website)
我猜想,如果你仍然在通道大小为5000时卡住,那是因为在node.Nodes
的所有循环中返回的值超过了5000个。一旦你的缓冲通道已满,操作就会阻塞,直到通道有空间,就像你写入无缓冲通道一样。
#修复
这里有一个最简示例,展示了如何修复这样的问题(基本上只需添加一个读取器):
package main
import "sync"
func getPage(item string, c chan<- string) {
c <- item
}
func readChannel(c <-chan string) {
for {
<-c
}
}
func main() {
arrChan := make(chan string)
wg := sync.WaitGroup{}
zips := []string{"78705", "78710", "78715"}
for _, item := range zips {
wg.Add(1)
go func() {
defer wg.Done()
getPage(item, arrChan)
}()
}
go readChannel(arrChan) // 注释掉这一行,你将会发生死锁
wg.Wait()
}
英文:
#Issue
Building off of Colin Stewart's answer, from the code you have posted, as far as I can tell, your issue is actually with reading your arrChan
. You write into it, but there's no place where you read from it in your code.
From the documentation :
> If the channel is unbuffered, the sender blocks until the receiver has received the value. If the channel has a buffer, the sender blocks only until the value
> has been copied to the buffer; if the buffer is full, this means
> waiting until some receiver has retrieved a value.
By making the channel buffered, what's happening is your code is no longer blocking on the channel write operations, the line that looks like:
c <- fmt.Sprint("%s%s%s%s",name,address,phoneNumber,website)
My guess is that if you're still hanging at when the channel has a size of 5000, it's because you have more than 5000 values returned across all of your loops over node.Nodes
. Once your buffered channel is full, the operations block until the channel has space, just like if you were writing to an unbuffered channel.
#Fix
Here's a minimal example showing you how you would fix something like this (basically just add a reader)
package main
import "sync"
func getPage(item string, c chan<- string) {
c <- item
}
func readChannel(c <-chan string) {
for {
<-c
}
}
func main() {
arrChan := make(chan string)
wg := sync.WaitGroup{}
zips := []string{"78705", "78710", "78715"}
for _, item := range zips {
wg.Add(1)
go func() {
defer wg.Done()
getPage(item, arrChan)
}()
}
go readChannel(arrChan) // comment this out and you'll deadlock
wg.Wait()
}
答案2
得分: 1
您的通道没有缓冲区,因此写入操作将会阻塞,直到该值可以被读取。至少在您发布的代码中,没有读取操作。
英文:
Your channel has no buffer, so writes will block until the value can be read, and at least in the code you have posted, there are no readers.
答案3
得分: 1
你不需要知道大小来使其工作。但是为了清理退出,你可能需要知道大小。这可能有点棘手,因为一旦你的主函数退出,所有仍在运行的goroutine都会立即被终止,无论是否已经完成。
作为一个热身示例,将photoionized响应中的readChannel更改为以下内容:
func readChannel(c <-chan string) {
for {
url := <-c
fmt.Println(url)
}
}
它只是在原始代码中添加了打印功能。但现在你将更清楚地看到实际发生的情况。注意,当代码实际上写入3个字符串时,通常只会打印两个字符串。这是因为代码在所有写入的goroutine完成后退出,但读取的goroutine在中途被中止。你可以通过在readChannel之前删除"go"(这将与在主函数中读取通道相同)来“修复”它。然后你将看到打印出3个字符串,但程序会因为readChannel仍然从通道中读取,而没有人再写入它而崩溃。你也可以通过在readChannel()中精确地读取3个字符串来修复这个问题,但这需要知道你期望接收多少个字符串。
这是我的最小工作示例(我将用它来说明其余部分):
package main
import (
"fmt"
"sync"
)
func getPage(url string, c chan<- string, wg *sync.WaitGroup) {
defer wg.Done()
c <- fmt.Sprintf("Got page for %s\n",url)
}
func readChannel(c chan string, wg *sync.WaitGroup) {
defer wg.Done()
var url string
ok := true
for ok {
url, ok = <- c
if ok {
fmt.Printf("Received: %s\n", url)
} else {
fmt.Println("Exiting readChannel")
}
}
}
func main() {
arrChan := make(chan string)
var swg sync.WaitGroup
base_url := "http://test/%s/%d"
zips := []string{"78705","78710","78715"}
for _, item := range zips{
swg.Add(1)
go getPage(fmt.Sprintf(base_url,item,1),arrChan,&swg)
}
var wg2 sync.WaitGroup
wg2.Add(1)
go readChannel(arrChan, &wg2)
swg.Wait()
// All written, signal end to readChannel by closing the channel
close(arrChan)
wg2.Wait()
}
在这里,我关闭通道以向readChannel发出信号,告诉它没有剩余的内容可读取,因此它可以在适当的时间干净地退出。但有时你可能希望告诉readChannel读取确切的3个字符串并完成。或者你可能希望为每个写入器启动一个读取器,每个读取器将读取一个字符串... 嗯,有很多方法可以解决问题,选择权完全在你手中。
注意,如果删除wg2.Wait()行,你的代码将等同于photoionized的响应,并且只会打印两个字符串,而写入了3个字符串。这是因为代码在所有写入器完成后退出(由swg.Wait()确保),但它不等待readChannel完成。
如果删除close(arrChan)行,你的代码将在打印3行后崩溃,因为代码等待readChannel完成,但readChannel等待从一个不再有人写入的通道中读取数据。
如果只是在readChannel调用之前删除"go",它将等同于在主函数中从通道中读取。它将再次在打印3个字符串后崩溃,因为当所有写入器已经完成(并且readChannel已经读取了它们写入的所有内容)时,readChannel仍在读取。这里的一个棘手之处是,这段代码永远不会到达swg.Wait()行,因为这段代码中的readChannel永远不会退出。
如果将readChannel调用移动到swg.Wait()之后,那么代码甚至在打印一个字符串之前就会崩溃。但这是一个不同的死锁。这次代码到达swg.Wait()并在那里停止等待写入器。第一个写入器成功,但通道没有缓冲,所以下一个写入器被阻塞,直到有人从通道中读取已经写入的数据。问题是 - 当时还没有人从通道中读取,因为readChannel尚未被调用。所以它停滞不前,并因死锁而崩溃。可以通过将通道缓冲设置为make(chan string, 3)
来“修复”这个特定问题,因为这将允许写入器继续写入,即使还没有人从该通道中读取。有时这就是你想要的。但这里你必须知道通道缓冲区中可能存在的最大消息数。大多数情况下,这只是推迟了问题 - 只需添加一个额外的写入器,你就会回到起点 - 代码停滞不前并崩溃,因为通道缓冲区已满,而那个额外的写入器正在等待有人从缓冲区中读取。
好了,这应该涵盖了所有情况。所以,请检查你的代码并看看哪种情况适用于你。
英文:
You don't need to know size to make it work. But you might in order to exit cleanly. Which can be a bit tricky to observe at time because your program will exit once your main function exits and all goroutines still running are killed immediately finished or not.
As a warm up example, change readChannel in photoionized's response to this:
func readChannel(c <-chan string) {
for {
url := <-c
fmt.Println (url)
}
}
It only adds printing to the original code. But now you'll see better what is actually happening. Notice how it usually only prints two strings when code actually writes 3. This is because code exits once all writing goroutines finish, but reading goroutine is aborted mid way as result. You can "fix" it by removing "go" before readChannel (which would be same as reading the channel in main function). And then you'll see 3 strings printed, but program crashes with a dead lock as readChannel is still reading from the channel, but nobody writes into it anymore. You can fix that too by reading exactly 3 strings in readChannel(), but that requires knowing how many strings you expect to receive.
Here is my minimal working example (I'll use it to illustrate the rest):
package main
import (
"fmt"
"sync"
)
func getPage(url string, c chan<- string, wg *sync.WaitGroup) {
defer wg.Done()
c <- fmt.Sprintf("Got page for %s\n",url)
}
func readChannel(c chan string, wg *sync.WaitGroup) {
defer wg.Done()
var url string
ok := true
for ok {
url, ok = <- c
if ok {
fmt.Printf("Received: %s\n", url)
} else {
fmt.Println("Exiting readChannel")
}
}
}
func main() {
arrChan := make(chan string)
var swg sync.WaitGroup
base_url := "http://test/%s/%d"
zips := []string{"78705","78710","78715"}
for _, item := range zips{
swg.Add(1)
go getPage(fmt.Sprintf(base_url,item,1),arrChan,&swg)
}
var wg2 sync.WaitGroup
wg2.Add(1)
go readChannel(arrChan, &wg2)
swg.Wait()
// All written, signal end to readChannel by closing the channel
close(arrChan)
wg2.Wait()
}
Here I close the channel to signal to readChannel that there is nothing left to read, so it can exit cleanly at proper time. But sometimes you might want instead to tell readChannel to read exactly 3 strings and finish. Or may be you would want to start one reader for each writer and each reader will read exactly one string... Well, there are many ways to skin a cat and choice is all yours.
Note, if you remove wg2.Wait() line your code becomes equivalent to photoionized's response and will only print two strings whilst writing 3. This is because code exits once all writers finish (ensured by swg.Wait()), but it does not wait for readChannel to finish.
If you remove close(arrChan) line instead, your code will crash with a deadlock after printing 3 lines as code waits for readChannel to finish, but readChannel waits to read from a channel which nobody is writing to anymore.
If you just remove "go" before the readChannel call, it becomes equivalent of reading from channel inside main function. It will again crash with a dead lock after printing 3 strings because readChannel is still reading when all writers have already finished (and readChannel has already read all they written). A tricky point here is that swg.Wait() line will never be reached by this code as readChannel never exits.
If you move readChannel call after the swg.Wait() then code will crash before even printing a single string. But this is a different dead lock. This time code reaches swg.Wait() and stops there waiting for writers. First writer succeeds, but channel is not buffered, so next writer blocks until someone reads from the channel the data already written. Trouble is - nobody reads from the channel yet as readChannel has not been called yet. So, it stalls and crashes with a dead lock. This particular issue can be "fixed", but making channel buffered as in make(chan string, 3)
as that will allow writers to keep writing even though nobody is reading from that channel yet. And sometimes this is what you want. But here again you have to know the maximum of messages to ever be in the channel buffer. And most of the time it's only deferring a problem - just add one more writer and you are where you started - code stalls and crashes as channel buffer is full and that one extra writer is waiting for someone to read from the buffer.
Well, this should covers all bases. So, check your code and see which case is yours.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论