如何评估矢量化任务的进展(使用Python和Pandas)

huangapple go评论66阅读模式
英文:

How to assess the progress of a vectorized task (in python with pandas)

问题

任务的向量化可以加速执行,但我找不到如何衡量向量化任务进度的方法(对于需要很长时间才能完成的任务)。我看到 tqdm 可能能完成这个任务,但我想知道是否有更简单的方法。

以 pandas 数据框为例(假设索引为 [0...n],每 1000 行输出一条消息):

for idx in df.index:
    df.loc[idx, 'B'] = a_function(df.loc[idx, 'A'])
    if (idx % 1000) == 0:
        print(idx)

这将显示进度,但如果 df 有几百万行并且 a_function() 不是简单的操作,可能会非常慢。

另一种方法是对操作进行向量化:

df['B'] = df['A'].apply(lambda x: a_function(x))

这可能会运行得更快,但不提供有关进度的任何提示。有没有办法获得有关向量化任务状态的信息?

英文:

Vectorization of tasks speeds up the execution, but I cannot find how to measure the progress of the vectorized task (in case of tasks taking a long time to complete). I've seen that tqdm might do the job, but I wonder if it is possible to do it in a simpler way.

Example with pandas dataframe (assume the index is [0...n] and a printout message is outputted each 1000 rows):

for idx in df.index:
    df.loc[idx, 'B'] = a_function(df.loc[idx, 'A'])
    if (idx % 1000) == 0:
        print(idx)

This will show the progress, but can be horribly slow if df has several million rows and a_function() is not trivial.

The alternative is to vectorize the operation:

df['B'] = df['A'].apply(lambda x: a_funcion(x))

which will probably run much quicker, but it does not provide any hint about the progress. Any idea on how to get this information on the status of the vectorized task?

答案1

得分: 1

tqdm 现在支持使用 progress_apply 方法处理 pandas.core 结构:

from tqdm import tqdm

tqdm.pandas()
df = pd.DataFrame(np.random.randint(0, 100, 3000_000), columns=['A'])
df['B'] = df['A'].progress_apply(lambda x: x**2)

它可以在不需要 print 语句的情况下显示进度(尽管对于所有情况可能不太方便)。

如何评估矢量化任务的进展(使用Python和Pandas)

英文:

tqdm now supports main generic pandas.core structures with progress_apply method:

from tqdm import tqdm

tqdm.pandas()
df = pd.DataFrame(np.random.randint(0, 100, 3000_000), columns=['A'])
df['B'] = df['A'].progress_apply(lambda x: x**2)

It shows progress without requiring print statement (though it may not be convenient for all cases).

如何评估矢量化任务的进展(使用Python和Pandas)

huangapple
  • 本文由 发表于 2023年5月15日 02:55:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76249205.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定