使用Go语言的协程返回多个线程

huangapple go评论75阅读模式
英文:

Using a GoLang Routine to Return to Multiple Threads

问题

我正在构建一个API,可以排队请求来获取外部网站,并通过与该API交互执行一些工作。我正在尝试找出如何避免重复同时进行的go例程。

也就是说,假设有一个请求要求获取http://www.example.com的内容。启动一个例程来处理该URL,可能需要几分钟甚至几小时。在此期间,可以出现任意数量的其他请求。如果新的请求尚未在处理中,它们应该启动自己的例程来处理。

然而,如果另一个针对example.com的请求到达,我希望该请求所在的线程被阻塞,直到前一个example.com请求完成,然后才能继续进行(如果第一个任务成功,进行快速的GET请求确认即可,如果失败,可以再次尝试)。

我找到的所有代码示例都使用了通道或等待组,但这些概念似乎只能阻塞一个线程。也就是说,第一个example.com线程正在等待通道返回一个值,但我不能让example.com请求#2等待同一个通道,因为结果只能从通道中读取一次。

如果有任何区别的话,我正在构建一个具有5个工作线程的工作池,如果另一个工作线程已经在处理example.com请求,就不会分配另一个工作线程。

我还考虑使用缓冲区来跟踪当前由工作线程处理的URL,但我不确定如何在等待URL从缓冲区中删除时进行阻塞。我看到一个示例使用了一个无限循环和一个当URL不再存在于缓冲区中时跳出循环的break语句,但这似乎是不必要的CPU滥用(无限期地对缓冲区进行for range循环,寻找URL不再存在并跳出循环)。

我该如何排队这些请求?

编辑:我正在寻找的解决方案是信号通道。感谢Burak Serdar指引我找到它。如果其他人需要此功能,请搜索“信号通道”即可找到大量信息。

英文:

I am building an API that can queue up requests to GET an external website and subsequently perform some work by interacting with that API. I am trying to figure out how to avoid duplicate
simultaneous go routines.

That is, assume a request comes in for http://www.example.com The routine is launched to handle that URL which could last minutes or even hours. Any number of other requests can come in while this is happening. If the new requests are not already being worked on, they should proceed with their own routine to fulfill.

However, if another request for example.com comes in, I want the thread that request comes in on to block until the previous example.com request is complete and then it may proceed (duplication is fine, if the first task succeeded it will be a quick GET to confirm, if it failed, trying again is fine).

All of the code examples I've found use channels or waitgroups but these concepts all appear to only block one thread. That is, the first example.com thread is waiting on say a channel to return a value, but I can't have example.com request #2 wait on that same channel since the result can only be read from the channel once.

If it makes any difference, I am building a worker pool with 5 workers, and a worker won't be allocated if another worker is already working the example.com request.

I've also considered using a buffer to keep track of the URLs that are currently being worked by the workers, but I wasn't sure how to block on waiting for the URL to be deleted from the buffer. I saw an example using an infinite for loop and a break for when the URL is no longer in the buffer, but that seems like unnecessary CPU abuse (indefinitely doing a for range on the buffer looking for the URL to no longer be present and to then break).

How do I queue these requests up?

Edit: Solution I was looking for is signal channels. Thanks to Burak Serdar for pointing me to it. If anyone else needs this, Google "signal channels" and you'll find plenty of info.

答案1

得分: 0

如果你想根据URL来阻塞一个goroutine,你可以实现一个类似下面这样的方案:

首先,保持一个包含所有正在处理的URL的映射表:

var urls = make(map[string]chan struct{})
var urllock = sync.Mutex{}

下面的addURL函数将把正在处理的URL添加到urls映射表中(如果它不存在),并返回一个通道来通知任务完成。如果URL已经在映射表中,它将返回一个等待通道。

func addURL(u string) (bool, chan struct{}) {
   urllock.Lock()
   defer urllock.Unlock()
   ret, exists := urls[u]
   if exists {
     return true, ret
   }
   ret = make(chan struct{})
   urls[u] = ret
   return false, ret
}

当你获得一个新的URL来处理时,尝试将其放入映射表中。如果成功,就开始处理它:

workOnURL, ch := addURL(newURL)
if workOnURL {
    go func() {
        defer removeURL(newURL)
        // 处理URL
    }()
} else {
  <-ch // 等待goroutine完成
  // 然后,你可以尝试重新安排相同的URL,或者做其他事情
}

使用removeURL函数,从映射表中删除URL,并关闭通道,这样任何等待该URL处理完成的goroutine都可以继续执行:

func removeURL(u string) {
   urllock.Lock()
   defer urllock.Unlock()
   ret := urls[u]
   delete(urls, u)
   close(ret)
}
英文:

If you want to block a goroutine based on the URL, you can implement a scheme that looks like this:

First, keep a map of all URLs being worked on:

var urls = make(map[string]chan struct{})
var urllock = sync.Mutex{}

The following addURL will add a URL being worked on to the urls map if it is not there, and will return a channel to notify completion of that task. If the URL is already in the map, it will return false with a wait channel.

func addURL(u string) (bool,chan struct{}) {
   urllock.Lock()
   defer urllock.Unlock()
   ret, exists:=urls[u]
   if exists {
     return true,ret
   }
   ret=make(chan struct{})
   urls[u]=ret
   return false,ret
}

When you get a new URL to work on, try to put it on the map. If you can, then work on it:

workOnURL, ch:=addURL(newURL)
if workOnURL {
    go func() {
        defer removeURL(newURL)
        // Work on URL
    }()
} else {
  &lt;-ch // Wait for the goroutine to finish
  // Then, you can try rescheduling the same URL, or do something else
}

With removeURL, remove the URL from the map, and close the channel, so any goroutine waiting for this to finish can continue:

func removeURL(u string) {
   urllock.Lock()
   defer urllock.Unlock()
   ret:=urls[u]
   delete(urls,u)
   close(ret)
}


</details>



huangapple
  • 本文由 发表于 2022年8月2日 03:46:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/73198843.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定