英文:
How to assess the progress of a vectorized task (in python with pandas)
问题
任务的向量化可以加速执行,但我找不到如何衡量向量化任务进度的方法(对于需要很长时间才能完成的任务)。我看到 tqdm 可能能完成这个任务,但我想知道是否有更简单的方法。
以 pandas 数据框为例(假设索引为 [0...n],每 1000 行输出一条消息):
for idx in df.index:
df.loc[idx, 'B'] = a_function(df.loc[idx, 'A'])
if (idx % 1000) == 0:
print(idx)
这将显示进度,但如果 df 有几百万行并且 a_function() 不是简单的操作,可能会非常慢。
另一种方法是对操作进行向量化:
df['B'] = df['A'].apply(lambda x: a_function(x))
这可能会运行得更快,但不提供有关进度的任何提示。有没有办法获得有关向量化任务状态的信息?
英文:
Vectorization of tasks speeds up the execution, but I cannot find how to measure the progress of the vectorized task (in case of tasks taking a long time to complete). I've seen that tqdm might do the job, but I wonder if it is possible to do it in a simpler way.
Example with pandas dataframe (assume the index is [0...n] and a printout message is outputted each 1000 rows):
for idx in df.index:
df.loc[idx, 'B'] = a_function(df.loc[idx, 'A'])
if (idx % 1000) == 0:
print(idx)
This will show the progress, but can be horribly slow if df has several million rows and a_function() is not trivial.
The alternative is to vectorize the operation:
df['B'] = df['A'].apply(lambda x: a_funcion(x))
which will probably run much quicker, but it does not provide any hint about the progress. Any idea on how to get this information on the status of the vectorized task?
答案1
得分: 1
tqdm
现在支持使用 progress_apply
方法处理 pandas.core
结构:
from tqdm import tqdm
tqdm.pandas()
df = pd.DataFrame(np.random.randint(0, 100, 3000_000), columns=['A'])
df['B'] = df['A'].progress_apply(lambda x: x**2)
它可以在不需要 print
语句的情况下显示进度(尽管对于所有情况可能不太方便)。
英文:
tqdm
now supports main generic pandas.core
structures with progress_apply
method:
from tqdm import tqdm
tqdm.pandas()
df = pd.DataFrame(np.random.randint(0, 100, 3000_000), columns=['A'])
df['B'] = df['A'].progress_apply(lambda x: x**2)
It shows progress without requiring print
statement (though it may not be convenient for all cases).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论