英文:
Exceeding Page Speed API Limits using async. How can i slow it down?
问题
我已经创建了调用Page Speed Insights API的代码。
build_cwv_data是一个异步协程,用于调用API并检索和处理特定URL的JSON数据。
根据文档,该API在100秒内有400个请求的限制。有趣的是,大约在100秒的时候,API开始返回409错误状态码(已超过配额)。
我的代码在100秒内大约进行了775次调用。
我不明白它是如何在这个时间段内进行这么多次调用的,因为我已经添加了休眠延迟来尝试减慢速度。
首先,为什么它还是如此迅速?我该如何减慢它的速度?
async def retrieve_cwv_data(urls_list):
site_id = 10234
tasks = []
rate_limit = 2 # 每秒最大的API调用次数
interval = 1 / rate_limit # API调用之间的间隔(秒)
count = 0
start_time = time.monotonic() # 初始启动时间
for url in urls_list:
task1 = asyncio.ensure_future(build_cwv_data(site_id, url, 'mobile', psi_key))
task2 = asyncio.ensure_future(build_cwv_data(site_id, url, 'desktop', psi_key))
tasks.append(task1)
tasks.append(task2)
count += 2
if count >= rate_limit * 2:
elapsed_time = time.monotonic() - start_time
if elapsed_time < interval:
# 引入延迟以保持在速率限制内
await asyncio.sleep(interval - elapsed_time)
# 重置计数和开始时间以准备下一秒
count = 0
start_time = time.monotonic()
results = await asyncio.gather(*tasks)
tmp_list = []
for result in results:
tmp_list.append(result)
return tmp_list```
<details>
<summary>英文:</summary>
I've created code to call the Page Speed Insights API.
The build_cwv_data is an async coroutine that calls the api and retrieves and processes json data for a particular URL.
According to the documentation the API has a limit of 400 requests per 100 seconds. And interestingly it is at around the 100 second mark that the API starts returning a 409 error status code (quota exceeded)
My code is doing approximately 775 calls in 100 seconds.
I don't understand how it is making so many calls in that time period as I have added sleep delays to try to slow it down.
Firstly, why is it still so fast? What can I do to slow it down?
```import time
async def retrieve_cwv_data(urls_list):
site_id = 10234
tasks = []
rate_limit = 2 # maximum number of API calls per second
interval = 1 / rate_limit # interval between API calls in seconds
count = 0
start_time = time.monotonic() # initial start time
for url in urls_list:
task1 = asyncio.ensure_future(build_cwv_data(site_id, url, 'mobile', psi_key))
task2 = asyncio.ensure_future(build_cwv_data(site_id, url, 'desktop', psi_key))
tasks.append(task1)
tasks.append(task2)
count += 2
if count >= rate_limit * 2:
elapsed_time = time.monotonic() - start_time
if elapsed_time < interval:
# introduce delay to stay within the rate limit
await asyncio.sleep(interval - elapsed_time)
# reset count and start time for the next second
count = 0
start_time = time.monotonic()
results = await asyncio.gather(*tasks)
tmp_list = []
for result in results:
tmp_list.append(result)
return tmp_list```
</details>
# 答案1
**得分**: 1
你在创建任务时暂停了 - 但它们只有在你将控制从你的代码传递给 asyncio 循环时才开始执行,在调用 `asyncio.gather` 时:在这一点上,所有的任务都会尽可能快地执行(每个下一个任务会在前一个任务内部尽快启动,发送请求并等待响应)。换句话说:你正在以你的计算机所能达到的最快速度进行调用 - 使错误在大约第100秒左右发生的延迟是因为在任务实际开始发出请求之前存在延迟。
永远记住,asyncio 代码只是常规的串行代码,在一个线程中运行,有显式的暂停点:除非你达到其中一个暂停点(或者当然,将某些事情委托给另一个线程或进程),你所看到的代码中的任何代码都不会运行 - 而这些暂停点要么是 `await` 关键字,要么是 `async for` 和 `async with`。
你必须更改你的代码,以便它可以正常工作 - 一种方法是使用信号量,它可以保持并发进行中的请求数限制为400,然后添加一些暂停。
```python
from asyncio import Semaphore
from time import time
call_semaphore = None
timeout = 100
...
async def makecall(*args):
async with call_semaphore:
start = time()
await build_cwv_data(*args)
elapsed = time() - start
# 确保此任务的信号量槽使用仅仅是观察速率限制:
await time.sleep(max(0, timeout - elapsed))
async def retrieve_cwv_data(urls_list):
global call_semaphore
call_semaphore = Semaphore(400)
site_id = 10234
tasks = []
for url in urls_list:
task1 = asyncio.ensure_future(makecall(site_id, url, 'mobile', psi_key))
task2 = asyncio.ensure_future(makecall(site_id, url, 'desktop', psi_key))
tasks.append(task1)
tasks.append(task2)
results = await asyncio.gather(*tasks)
tmp_list = []
for result in results:
tmp_list.append(result)
return tmp_list
希望这有所帮助。
英文:
You made the pause while creating the tasks - but they only start to get executed when you pass the control from your code to the asyncio loop, in the call to asyncio.gather
: at that point all your taks are actually executed as fast as possible (each next task starts as soon as the previous one, internally, sends a request and awaits for its response). In other words: you are making the calls as fast as your computer can go - the delay that makes the fault happen around the 100th second is because you have delays before the tasks actually start to making requests.
Always keep in mind that asyncio code is just regular, serialized code, running in a single thread with explicit pause points: no code out of what you are looking at ever runs unless you reach one of those pause-points (or, of course, delegate something to another thread or process) - and the pause points are either the await
keyword, or async for
and async with
.
You have to change your code so that it can work - one way to do that is a semaphore, that could hold the limit of concurrent ongoing requests to 400 and then add some pause.
from asyncio import Semaphore
from time import time
call_semaphore = None
timeout = 100
...
async def makecall(*args):
async with call_semaphore:
start = time()
await build_cwv_data(*args)
elapsed = time() - start
# ensure the semaphore slot usage for this task is just fred observing the rate limit:
await time.sleep(max(0, timeout - elapsed))
async def retrieve_cwv_data(urls_list):
global call_semaphore
call_semaphore = Semaphore(400)
site_id = 10234
tasks = []
for url in urls_list:
task1 = asyncio.ensure_future(makecall(site_id, url, 'mobile', psi_key))
task2 = asyncio.ensure_future(makecall(site_id, url, 'desktop', psi_key))
tasks.append(task1)
tasks.append(task2)
results = await asyncio.gather(*tasks)
tmp_list = []
for result in results:
tmp_list.append(result)
return tmp_list
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论