英文:
Limit the number of times a ping a hostname every second
问题
我正在编写一个用于学习Go语言的网络爬虫。
我目前的实现方式是使用10个Go协程来获取网站信息,我想限制每秒钟访问同一个主机名的次数。
在这种情况下,最好的(线程安全的)方法是什么?
英文:
I am writing a web crawler to learn go
My current implementation uses 10 go routines to get websites, I want to limit the number of times I can hit a hostname every second.
What is the best (thread-safe) approach to do this.
答案1
得分: 1
一个channel提供了一个并发同步机制,你可以用它来进行协调。你可以与time.Ticker
一起使用,定期调度一定数量的函数调用。
// PeriodicResource是一个定期重新缓冲的通道。
type PeriodicResource <-chan bool
// NewPeriodicResourcePool提供了一个在给定持续时间后填充的缓冲通道。
// 通道的大小由count指定。这提供了一种限制函数在每个持续时间内调用count次的方法。
func NewPeriodicResource(count int, reset time.Duration) PeriodicResource {
ticker := time.NewTicker(reset)
c := make(chan bool, count)
go func() {
for {
// 等待定期计时器
<-ticker.C
// 填充缓冲区
for i := len(c); i < count; i++ {
c <- true
}
}
}()
return c
}
每个ticker事件都有一个单独的goroutine等待,并尝试将缓冲通道填满到最大容量。如果消费者没有耗尽缓冲区,后续的tick只会重新填充它。你可以使用该通道在每个duration内最多执行n次操作。例如,我可能希望每秒钟最多调用doSomething()
五次。
r := NewPeriodicResource(5, time.Second)
for {
// 尝试从PeriodicResource中取出
<-r
// 每次调用都是同步从定期资源中获取
doSomething()
}
当然,同一个通道也可以用于调用go doSomething()
,这将在每秒钟最多启动五个进程。
英文:
A channel provides a concurrent synchronization mechanism you can use to coordinate with. You could use one in coordination with a time.Ticker
to periodically dispatch a given number of function calls.
// A PeriodicResource is a channel that is rebuffered periodically.
type PeriodicResource <-chan bool
// The NewPeriodicResourcePool provides a buffered channel that is filled after the
// given duration. The size of the channel is given as count. This provides
// a way of limiting an function to count times per duration.
func NewPeriodicResource(count int, reset time.Duration) PeriodicResource {
ticker := time.NewTicker(reset)
c := make(chan bool, count)
go func() {
for {
// Await the periodic timer
<-ticker.C
// Fill the buffer
for i := len(c); i < count; i++ {
c <- true
}
}
}()
return c
}
A single go routine waits for each ticker event and attempts to fill a buffered channel to max capacity. If a consumer does not deplete the buffer any successive tick only refills it. You can use the channel to synchronously perform an action at most n times per duration. For example, I may want to call doSomething()
no more than five times per second.
r := NewPeriodicResource(5, time.Second)
for {
// Attempt to deque from the PeriodicResource
<-r
// Each call is synchronously drawing from the periodic resource
doSomething()
}
Naturally, the same channel could be used to call go doSomething()
which would fan out at most five processes per second.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论