2023年8月11日 02:46:46go评论146阅读模式

英文:

Is there a way to multithread or batch REST API calls in Python?

问题

我有一个非常长的密钥列表，我正在使用每个密钥调用REST API以获取关于它的一些元数据。

API只能一次接受一个密钥，但我想知道是否有办法可以批量或多线程从我的一侧调用这些API？

英文:

I've got a very long list of keys, and I am calling a REST API with each key to GET some metadata about it.

The API can only accept one key at a time, but I wondered if there was a way I could batch or multi-thread the calls from my side?

答案1

得分: 0

是的，有多种方法可以在Python中多线程或批量处理REST API调用，以提高程序性能。一种方法是使用concurrent.futures模块，该模块提供了一个高级接口，用于异步执行函数，可以使用线程或进程。

以下是一个示例代码，演示了如何使用concurrent.futures来批量执行多线程REST API调用：

import requests
from concurrent.futures import ThreadPoolExecutor
from itertools import islice
API_ENDPOINT = 'https://api.example.com/metadata'
def get_metadata(keys):
    results = []
    with ThreadPoolExecutor(max_workers=5) as executor:
        for batch in iter(lambda: list(islice(keys, 5)), []):
            futures = [executor.submit(get_metadata_for_key, key) for key in batch]
            results += [future.result() for future in futures]
    return results
            
def get_metadata_for_key(key):
    url = f"{API_ENDPOINT}/{key}"
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()
    else:
        return None

在这个示例中，get_metadata函数接受一个键列表，并使用ThreadPoolExecutor来批量执行get_metadata_for_key函数，每次处理5个键。islice函数用于创建一个迭代器，从输入列表中返回5个键的批次。executor.submit函数用于为批次中的每个键提交一个新任务，它返回一个concurrent.futures.Future对象。future.result()函数用于检索每个任务的结果并将其附加到结果列表中。

您可以修改max_workers参数来控制用于执行任务的线程数。在这个示例中，我使用了5个线程。

英文:

Yes, there are ways to multithread or batch REST API calls in Python to improve the performance of your program. One way to do this is by using the concurrent.futures module which provides a high-level interface for asynchronously executing functions using threads or processes.

Here's an example code that shows how you can use concurrent.futures to perform multithreaded REST API calls in batches:

import requests
from concurrent.futures import ThreadPoolExecutor
from itertools import islice
API_ENDPOINT = &#39;https://api.example.com/metadata&#39;
def get_metadata(keys):
    results = []
    with ThreadPoolExecutor(max_workers=5) as executor:
        for batch in iter(lambda: list(islice(keys, 5)), []):
            futures = [executor.submit(get_metadata_for_key, key) for key in batch]
            results += [future.result() for future in futures]
    return results
            
def get_metadata_for_key(key):
    url = f&quot;{API_ENDPOINT}/{key}&quot;
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()
    else:
        return None

In this example, get_metadata function takes a list of keys and uses the ThreadPoolExecutor to execute get_metadata_for_key function for each key in batches of 5. The islice function is used to create an iterator that returns batches of 5 keys from the input list. The executor.submit function is used to submit a new task to the thread pool for each key in the batch, which returns a concurrent.futures.Future object. The future.result() function is used to retrieve the result of each task and append it to the results list.

You can modify the max_workers parameter to control the number of threads used for executing tasks. In this example, I'm using 5 threads.

答案2

得分: 0

I will provide the translation of the code part you provided:

import requests
from concurrent.futures import ThreadPoolExecutor
API_ENDPOINT = 'https://api.example.com/metadata'
def get_metadata_for_key(key):
    url = f"{API_ENDPOINT}/{key}"
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()
    else:
        return None
def get_save_metadata(keys, workers):
    results = {}
    batches = [keys[i : i + workers] for i in range(0, len(keys), workers)]
    with ThreadPoolExecutor(max_workers=workers) as executor:
        for batch in tqdm(batches):     #tqdm shows a progress bar
            futures = {key: executor.submit(get_metadata_for_key, key) for key in batch}
            futures_clean = {k: v.result() for k, v in futures.items() if v is not None}
            results.update({k: xmltodict.parse(v) for k, v in futures_clean.items()})
    
    return results

Please note that the code remains in English as per your request.

英文:

The other reply to this looks like ChatGPT so it should be ignored.

I did, however, use its code as a base to write a function that does what I want.

import requests
from concurrent.futures import ThreadPoolExecutor
API_ENDPOINT = &#39;https://api.example.com/metadata&#39;
def get_metadata_for_key(key):
    url = f&quot;{API_ENDPOINT}/{key}&quot;
    response = requests.get(url)
    if response.status_code == 200:
        return response.json()
    else:
        return None
def get_save_metadata(keys, workers):
    results = {}
    batches = [keys[i : i + workers] for i in range(0, len(keys), workers)]
    with ThreadPoolExecutor(max_workers=workers) as executor:
        for batch in tqdm(batches):     #tqdm shows a progress bar
            futures = {key: executor.submit(get_metadata_for_key, key) for key in batch}
            futures_clean = {k: v.result() for k, v in futures.items() if v is not None}
            results.update({k: xmltodict.parse(v) for k, v in futures_clean.items()})
    
    return results

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有没有一种方法可以在Python中进行REST API调用的多线程或批处理？

问题

答案1

答案2

适当的RESTful API实践是在一个函数中运行2个查询吗？

如何克隆一个Python类对象？（不是实例，而是类本身）

如何在PyTorch中使用.mat格式的模型？

pandas column-slices with mypy

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。