2023年4月11日 02:46:25go评论58阅读模式

英文:

How to apply asynchronous calls to API with Pandas apply() function

问题

我有一个约14,000行的数据框，尝试通过调用API将一些数据填充到新列中。下面的代码检索了预期的响应，但似乎每次迭代都在等待响应后才继续下一行。

这是函数：

def market_sector_des(isin):
    isin = '/isin/' + isin
    return blp.bdp(tickers=isin, flds=['market_sector_des']).iloc[0]

我正在使用xbbg来调用Bloomberg API。

.apply()函数返回了预期的响应：

df['new_column'] = df['ISIN'].apply(market_sector_des)

但每个响应大约需要2秒，对于14,000行来说大约需要8小时。

有没有办法使这个apply函数异步，以便所有请求都可以并行发送？我已经看到Dask作为一种替代方法，但是在使用它时遇到了问题。

英文:

I have a ~14,000 row dataframe and attempting to fill in some data into a new column by calling an API. The code below retrieves the expected response, however, it seems each iteration waits for a response to go to the next row.

Here is the function:

def market_sector_des(isin):
isin = &#39;/isin/&#39; + isin
return blp.bdp(tickers = isin, flds = [&#39;market_sector_des&#39;]).iloc[0]

I am using xbbg to call the Bloomberg API.

The .apply() function returns the expected response,

df[&#39;new_column&#39;] = df[&#39;ISIN&#39;].apply(market_sector_des)

but each response takes around 2 seconds, which at 14,000 lines is roughly 8 hours.

Is there any way to make this apply function asynchronous so that all requests are sent in parallel? I have seen dask as an alternative, however, I am running into issues using that as well.

答案1

得分: 1

如果以上正是您想要做的事情，那么可以通过创建一个包含要发送的股票代码语法的列，然后将该列作为一个系列通过blpapi传递来实现。

df['ISIN_NEW'] = '/isin/' + df['ISIN']
isin_new = pd.unique(df['ISIN_NEW'].dropna())
mktsec_df = blp.bdp(tickers=isin_new, flds=['market_sector_des'])

然后，您可以将新创建的df与现有的df连接，以便获得列中的数据。

newdf = pd.merge(df, mktsec_df, how='left', left_on='ISIN_NEW', right_index=True)

这应该会导致一次调用，理想情况下速度会降低到一分钟以下。如果这个方法有效，请告诉我。

英文:

If the above is exactly what you want to do, then it can be achieved by creating a column which contains the ticker syntax to be sent, and then pass that column as a series through blpapi

df[&#39;ISIN_NEW&#39;] = &#39;/isin/&#39; + df[&#39;ISIN&#39;]
isin_new = pd.unique(df[&#39;ISIN_NEW&#39;].dropna())
mktsec_df = blp.bdp(tickers = isin_new, flds = [&#39;market_sector_des&#39;])

You can then join the newly created df to your existing df, so that you get the figures in columns intact.

newdf = pd.merge(df, mktsec_df, how=&#39;left&#39;, left_on = &#39;ISIN_NEW&#39;, right_index = True )

This should result in a single call, which would ideally drop the speed to less than a minute. Do let me know if this works out.

答案2

得分: 0

以下是代码部分的翻译：

# 使用`multiprocessing`来并行执行API调用。将您的Series分成THREAD块，然后针对每个块运行一个进程：

`main.py`
```python
import multiprocessing as mp
import pandas as pd
import numpy as np
import parallel_tickers

THREADS = mp.cpu_count() - 1

# df = 您的数据框在这里
split = np.array_split(df['ISIN'], THREADS)
with mp.Pool(THREADS) as pool:
    data = pool.map(proxy_func, split)

df['new_column'] = pd.concat(data)

parallel_tickers.py

import pandas as pd
from xbbg import blp

def market_sector_des(isin):
    isin = '/isin/' + isin
    return blp.bdp(tickers = isin, flds = ['market_sector_des']).iloc[0]

def proxy_func(sr):
    return pd.Series([market_sector_des(isin) for isin in sr], index=sr.index)

EDIT: 使用另一个模块来进行mp函数


<details>
<summary>英文:</summary>

You can use `multiprocessing` to parallelize API calls. Divide your Series into THREAD chunks then run one process per chunk:

`main.py`

import multiprocessing as mp
import pandas as pd
import numpy as np
import parallel_tickers

THREADS = mp.cpu_count() - 1

df = your_dataframe_here

split = np.array_split(df['ISIN'], THREADS)
with mp.Pool(THREADS) as pool:
data = pool.map(proxy_func, split)

df['new_column'] = pd.concat(data)


`parallel_tickers.py`

import pandas as pd
from xbbg import blp

def market_sector_des(isin):
isin = '/isin/' + isin
return blp.bdp(tickers = isin, flds = ['market_sector_des']).iloc[0]

def proxy_func(sr):
return pd.Series([market_sector_des(isin) for isin in sr], index=sr.index)


EDIT: use another module for mp functions

</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Pandas apply()函数中应用异步调用到API

问题

答案1

答案2

df = your_dataframe_here

获取亚马逊评论的CSV文件，使用Python和AWS。

比较 pandas 数据框的列，忽略文本前面的数字。

JavaScript数据对象在条件检查中被填充，但特定对象变量的次要输出未定义。

如何在执行异步任务时成功返回 API 调用？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论