我的请求是否以异步方式处理?

huangapple go评论65阅读模式
英文:

Are my requests being processed asynchronously?

问题

以下是翻译好的部分:

  • 我有一个抓取脚本,它同步请求大约30个URL。完成所有请求需要95秒。
  • 我已经重写了这个脚本,使用了异步库asyncioaiohttp,以提高速度。
  • 这里是32个请求的性能统计信息:
    • 同步请求总时间 = 95秒
    • 异步请求总时间 = 60秒
    • 通过浏览器手动访问URL - 大约1秒
  • 我认为速度提升了50%非常差,所以我怀疑我的请求并没有真正异步执行(我对asyncio不熟悉)。
  • 实际上,由于单个请求仅需要1秒,我只发出了32个请求,我原本期望我的总异步请求时间应该少于1.5秒。32个请求很少,所以我认为它们几乎同时启动,等待最后一个完成不应该超过1.5秒。
  • 我将不胜感激任何提示。
英文:

I have a scraping script which requests about 30 urls synchronously. It takes 95s to complete all requests.

I've rewritten the script using asynchronous libraries asyncio and aiohttp in order to improve speed.

Here are the performance statistics for 32 requests:

  • synchronous requests total time = 95 seconds
  • asychronous requests total time = 60 seconds
  • single, manual hit on the url from the browser - about 1 second

I think speed improvement of 50% is very bad so I'm trying to suspect that my requests are not really firing asychronously (I'm new to asyncio).

In fact, since the single request takes only 1 second and I make only 32 requests I was expecting my total asychronous requests time to be less than 1.5 seconds. 32 requests is very few so I assume that they all would start almost at the same time, and so waiting for the last to complete shouldn't take more than 1.5 seconds.

I would appreciate any hints.

# Single asynchronous request
async def async_get_course(session, url_course):
    
    async with session.get(url_course) as res:
        response = await res.content.read() 
        return response

# Main coroutine
async def example(courses, root):

  starting_time = time.time()
    
  actions = []
  data = []
  data2 = []
  async with aiohttp.ClientSession() as session:
      for course in courses:
          url_course = f"{root}{course['course_link']}"
          data.append(url_course)
          actions.append(asyncio.ensure_future(async_get_course(session, url_course)))
      results = await asyncio.gather(*actions)
      
      for idx, res in enumerate(results):
          data2.append(get_info_from_course((courses[idx], data[idx], res)))

  total_time = time.time() - starting_time
  print('total_time', total_time)

  return data2

# Run the coroutine
courses_final = asyncio.run(example(courses, root))

答案1

得分: 1

您的操作将同时执行:

await asyncio.gather(*actions)

所以您做得一切都正确,您正在运行异步和并发代码。您的请求不会同时开始。事件循环将在一个请求被阻塞时处理另一个请求。这并不意味着您的策略不好,一般来说,使用asyncio来处理I/O密集型操作非常有帮助。您可以尝试将您的操作拆分成多个进程,同时仍然利用asyncio。

英文:

Your actions will execute concurrently:

await asyncio.gather(*actions)

So you did everything correctly, you are running asynchronous and concurrent code. Your requests will not start at the same time. The event loop will process one more request when the other is blocked. This doesn't mean your strategy here is bad, in general, it's very helpful to use asyncio to process I/O heavy operations as you have. You can try to split your actions into several processes and still utilize asyncio.

huangapple
  • 本文由 发表于 2023年3月7日 02:06:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75654328.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定