英文:
How to avoid running into max open files limit
问题
我正在构建一个应用程序,将使用go routines和普通的http get请求并发地下载大约5000个CSV文件。目前遇到了OS X强加的打开文件限制。
这些CSV文件通过http提供。是否有其他网络协议可以将每个请求批量处理为一个请求?由于我无法访问服务器,所以无法将它们压缩成zip文件。我也不想更改ulimit,因为一旦投入生产,我可能无法访问该配置。
英文:
I'm building an application that will be downloading roughly 5000 CSV files concurrently using go routines and plain ol http get requests. Downloading the files in parallel.
I'm currently running into open file limits imposed by OS X.
The CSV files are served over http. Are there any other network protocols that I can use to batch each request into one? I don't have access to the server, so I can't zip them. I'd also prefer not to change the ulimit because once in production, I probably won't have access to that configuration.
答案1
得分: 3
你可能希望将同时进行的活动请求限制在比5000更合理的数量。可以启动10/20个工作线程,并通过通道将单个文件发送给它们。
假设http客户端始终读取完整的请求体并关闭连接,它应该会重用连接。
代码示例:
func main() {
http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = 100
for i := 0; i < 10; i++ {
wg.Add(1)
go worker()
}
var csvs = []string{"http://example.com/a.csv", "http://example.com/b.csv"}
for _, u := range csvs {
ch <- u
}
close(ch)
wg.Wait()
}
var ch = make(chan string)
var wg sync.WaitGroup
func worker() {
defer wg.Done()
for u := range ch {
get(u)
}
}
func get(u string) {
resp, err := http.Get(u)
// 在这里检查错误
// 确保始终读取剩余的响应体并关闭连接
defer resp.Body.Close()
defer io.Copy(ioutil.Discard, resp.Body)
// 读取、解码/处理响应体。确保读取完整的响应体。
}
希望对你有帮助!
英文:
You probably want to limit active concurrent requests to a more sensible number than 5000. Possibly spin up 10/20 workers and send individual files to them over a channel.
The http client should reuse connections for requests, assuming you always read the entire request body, and close it.
Something like this:
func main() {
http.DefaultTransport.(*http.Transport).MaxIdleConnsPerHost = 100
for i := 0; i < 10; i++ {
wg.Add(1)
go worker()
}
var csvs = []string{"http://example.com/a.csv", "http://example.com/b.csv"}
for _, u := range csvs {
ch <- u
}
close(ch)
wg.Wait()
}
var ch = make(chan string)
var wg sync.WaitGroup
func worker() {
defer wg.Done()
for u := range ch {
get(u)
}
}
func get(u string) {
resp, err := http.Get(u)
//check err here
// make sure we always read rest of body, and close
defer resp.Body.Close()
defer io.Copy(ioutil.Discard, resp.Body)
//read and decode / handle it. Make sure to read all of body.
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论