发送多个HTTP请求并同时收集来自网站的数据的最佳方法

huangapple go评论83阅读模式
英文:

Best way of sending multiple http requests at the same time when collect data from websites

问题

我通过Python从一个网站收集数据进行人工智能训练。我分别向网站的索引发送请求。在解析HTML后,如果我在HTML中找到了符合我的目的的有意义的数据,我会保存它并发送请求到另一个索引。

需要检查的网站超过500万个,所以我认为我应该同时发送多个请求,否则我无法完成它们。

我正在寻找同时发送多个请求的最佳方式。我知道有以下几种方式:线程、多个Python脚本、异步函数。但我不确定哪种方式最好。

谢谢。

英文:

I collect datas from a website for AI training by Python. I send requests to indexes of a website respectively. After parsing the html, if i find a meaningful data for my purpose in the html, I save it and send request to another index.
There are more than 5 million websites that should be checked. So I think i should send multiple request at a time. Else, I can't finish them.

I am looking for best way to send multiple request at the same time. I know the ways: thread, multiple python scripts, async functions. But I am not sure about the best way.

Thank you.

答案1

得分: 1

我会使用Requests Futures,它是Requests的一个非常简单的异步包装器,您可以按如下方式使用它:

from concurrent.futures import as_completed
from requests_futures.sessions import FuturesSession

with FuturesSession() as session:
    futures = [session.get(url) for url in urls]
    for future in as_completed(futures):
        res = future.result()
        print(res.json())
英文:

I would use Requests Futures, its a very simple async wrapper of Requests, you can use it as follows:

from concurrent.futures import as_completed
from requests_futures.sessions import FuturesSession

with FuturesSession() as session:
    futures = [session.get(url) for url in urls]
    for future in as_completed(futures):
        res = future.result()
        print(res.json())

huangapple
  • 本文由 发表于 2020年1月4日 01:36:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/59582931.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定