英文:
Best way of sending multiple http requests at the same time when collect data from websites
问题
我通过Python从一个网站收集数据进行人工智能训练。我分别向网站的索引发送请求。在解析HTML后,如果我在HTML中找到了符合我的目的的有意义的数据,我会保存它并发送请求到另一个索引。
需要检查的网站超过500万个,所以我认为我应该同时发送多个请求,否则我无法完成它们。
我正在寻找同时发送多个请求的最佳方式。我知道有以下几种方式:线程、多个Python脚本、异步函数。但我不确定哪种方式最好。
谢谢。
英文:
I collect datas from a website for AI training by Python. I send requests to indexes of a website respectively. After parsing the html, if i find a meaningful data for my purpose in the html, I save it and send request to another index.
There are more than 5 million websites that should be checked. So I think i should send multiple request at a time. Else, I can't finish them.
I am looking for best way to send multiple request at the same time. I know the ways: thread, multiple python scripts, async functions. But I am not sure about the best way.
Thank you.
答案1
得分: 1
我会使用Requests Futures,它是Requests的一个非常简单的异步包装器,您可以按如下方式使用它:
from concurrent.futures import as_completed
from requests_futures.sessions import FuturesSession
with FuturesSession() as session:
futures = [session.get(url) for url in urls]
for future in as_completed(futures):
res = future.result()
print(res.json())
英文:
I would use Requests Futures, its a very simple async wrapper of Requests, you can use it as follows:
from concurrent.futures import as_completed
from requests_futures.sessions import FuturesSession
with FuturesSession() as session:
futures = [session.get(url) for url in urls]
for future in as_completed(futures):
res = future.result()
print(res.json())
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论