Python Polars:如何在应用循环中添加进度条

huangapple go评论154阅读模式
英文:

Python Polars: How to add a progress bars to apply loops

问题

以下是您要翻译的内容:

"Is it possible to add a progress bar to a Polars apply loop with a custom function?

For example, how would I add a progress bar to the following toy example:

import polars as pl

df = pl.DataFrame(
{
"team": ["A", "A", "A", "B", "B", "C"],
"conference": ["East", "East", "East", "West", "West", "East"],
"points": [11, 8, 10, 6, 6, 5],
"rebounds": [7, 7, 6, 9, 12, 8]
}
)

df.groupby('team').apply(lambda x: x.select(pl.col('points').mean()))

Edit 1:

After help from @Jcurious, I have the following 'tools' that can be re-used for other functions, however it does not print to console correctly.

def pl_progress_applier(func, task_id, progress, **kwargs):
progress.update(task_id, advance=1, refresh=True)
return func(**kwargs)

def pl_groupby_progress_apply(data, group_by, func, drop_cols=[], **kwargs):

global progress
with Progress() as progress:
num_groups = len(data.select(group_by).unique())
task_id = progress.add_task("Applying", total=num_groups)
return (
data
.groupby(group_by)
.apply(lambda x: pl_progress_applier(
x=x.drop(drop_cols), func=func, task_id=task_id, progress=progress, **kwargs)
)
)

and using the function custom_func, we can return a table, however the progress bar jumps to 100%

def custom_func(x):
return x.select(pl.col('points').mean())

pl_groupby_progress_apply(
data=df,
group_by='team',
func=custom_func
)

Any ideas on how to get the progress bar to actually work?

Edit 2:

It seems like the above functions do indeed work, however if you're using PyCharm (like me), then it does not work. Enjoy non-PyCharm users!"

英文:

Is it possible to add a progress bar to a Polars apply loop with a custom function?

For example, how would I add a progress bar to the following toy example:

  1. import polars as pl
  2. df = pl.DataFrame(
  3. {
  4. "team": ["A", "A", "A", "B", "B", "C"],
  5. "conference": ["East", "East", "East", "West", "West", "East"],
  6. "points": [11, 8, 10, 6, 6, 5],
  7. "rebounds": [7, 7, 6, 9, 12, 8]
  8. }
  9. )
  10. df.groupby('team').apply(lambda x: x.select(pl.col('points').mean()))

Edit 1:

After help from @Jcurious, I have the following 'tools' that can be re-used for other functions, however it does not print to console correctly.

  1. def pl_progress_applier(func, task_id, progress, **kwargs):
  2. progress.update(task_id, advance=1, refresh=True)
  3. return func(**kwargs)
  4. def pl_groupby_progress_apply(data, group_by, func, drop_cols=[], **kwargs):
  5. global progress
  6. with Progress() as progress:
  7. num_groups = len(data.select(group_by).unique())
  8. task_id = progress.add_task("Applying", total=num_groups)
  9. return (
  10. data
  11. .groupby(group_by)
  12. .apply(lambda x: pl_progress_applier(
  13. x=x.drop(drop_cols), func=func, task_id=task_id, progress=progress, **kwargs)
  14. )
  15. )
  16. # and using the function custom_func, we can return a table, howevef the progress bar jumps to 100%
  17. def custom_func(x):
  18. return x.select(pl.col('points').mean())
  19. pl_groupby_progress_apply(
  20. data=df,
  21. group_by='team',
  22. func=custom_func
  23. )

Any ideas on how to get the progress bar to actually work?

Edit 2:

It seems like the above functions do indeed work, however if you're using PyCharm (like me), then it does not work. Enjoy non-PyCharm users!

答案1

得分: 4

progress.update() 可以手动更新进度条。

  1. from pip._vendor.rich.progress import Progress
  2. def my_custom_function(group):
  3. progress.update(task_id, advance=1)
  4. return group.select(pl.col('points').mean())
  5. with Progress() as progress:
  6. num_groups = df.get_column("team").unique().len()
  7. task_id = progress.add_task("Applying", total=num_groups)
  8. df.groupby('team').apply(my_custom_function)

虽然也许你应该分享一下你实际在做什么,因为.groupby.apply() 会比较慢 - 可能有更好的方法。

英文:

You could use rich.progress which also comes bundled with pip.

progress.update() can manually update a progress bar.

  1. from pip._vendor.rich.progress import Progress
  2. def my_custom_function(group):
  3. progress.update(task_id, advance=1)
  4. return group.select(pl.col('points').mean())
  5. with Progress() as progress:
  6. num_groups = df.get_column("team").unique().len()
  7. task_id = progress.add_task("Applying", total=num_groups)
  8. df.groupby('team').apply(my_custom_function)

Although perhaps you should share what you're actually doing as .groupby.apply() is going to be "slow" - there may be a better way.

答案2

得分: 2

  1. 我发现的最好解决方案是 tqdm我们希望一个解决方案
  2. 1. 使我们保持在 polars 编码风格中
  3. 2. 通用
  4. 要做到这一点我们只需定义这个函数
  5. import polars as pl
  6. from tqdm import tqdm
  7. def w_pbar(pbar, func):
  8. def foo(*args, **kwargs):
  9. pbar.update(1)
  10. return func(*args, **kwargs)
  11. return foo
  12. 现在我们可以采用您的原始代码生成 pbar并在适当的位置添加 'w_pbar'
  13. df = pl.DataFrame(
  14. {
  15. "team": ["A", "A", "A", "B", "B", "C"],
  16. "conference": ["East", "East", "East", "West", "West", "East"],
  17. "points": [11, 8, 10, 6, 6, 5],
  18. "rebounds": [7, 7, 6, 9, 12, 8]
  19. }
  20. )
  21. num_groups = df.get_column("team").unique().len()
  22. with tqdm(total=num_groups) as pbar:
  23. res = df.groupby('team').apply(w_pbar(pbar, lambda x: x.select(pl.col('points').mean())))
  24. 您可以使用任何您想要的设置生成 pbartqdm 对象)。并将 w_pbar 添加到任何 'apply' 的用法中
  25. 顺便说一下它也适用于没有 'groupby' 'apply'
  26. pbar = tqdm(total=len(df), desc='adding 1 to points', colour='green')
  27. df1 = df.with_columns(pl.col('points').apply(w_pbar(pbar, lambda x: x + 1)).alias('points+1'))
  28. pbar.close()
英文:

The best solution I found is tqdm. We want a solution that

  1. enable us stay in the polars coding style.
  2. General

To do so, all we have to define is this function:

  1. import polars as pl
  2. from tqdm import tqdm
  3. def w_pbar(pbar, func):
  4. def foo(*args, **kwargs):
  5. pbar.update(1)
  6. return func(*args, **kwargs)
  7. return foo

Now, we could take your original code, generate pbar and add 'w_pbar' in the appropriate place:

  1. df = pl.DataFrame(
  2. {
  3. "team": ["A", "A", "A", "B", "B", "C"],
  4. "conference": ["East", "East", "East", "West", "West", "East"],
  5. "points": [11, 8, 10, 6, 6, 5],
  6. "rebounds": [7, 7, 6, 9, 12, 8]
  7. }
  8. )
  9. num_groups = df.get_column("team").unique().len()
  10. with tqdm(total=num_groups) as pbar:
  11. res = df.groupby('team').apply(w_pbar(pbar, lambda x: x.select(pl.col('points').mean())))

You can generate pbar (the tqdm object) with every setting you want. And add w_pbar to any usage of 'apply'.

bty, it also works for 'apply' without 'groupby':

  1. pbar = tqdm(total=len(df), desc='adding 1 to points', colour='green')
  2. df1 = df.with_columns(pl.col('points').apply(w_pbar(pbar, lambda x: x + 1)).alias('points+1'))
  3. pbar.close()

huangapple
  • 本文由 发表于 2023年2月24日 04:43:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/75550124.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定