英文:
Python Polars: How to add a progress bars to apply loops
问题
以下是您要翻译的内容:
"Is it possible to add a progress bar to a Polars apply loop with a custom function?
For example, how would I add a progress bar to the following toy example:
import polars as pl
df = pl.DataFrame(
{
"team": ["A", "A", "A", "B", "B", "C"],
"conference": ["East", "East", "East", "West", "West", "East"],
"points": [11, 8, 10, 6, 6, 5],
"rebounds": [7, 7, 6, 9, 12, 8]
}
)
df.groupby('team').apply(lambda x: x.select(pl.col('points').mean()))
Edit 1:
After help from @Jcurious, I have the following 'tools' that can be re-used for other functions, however it does not print to console correctly.
def pl_progress_applier(func, task_id, progress, **kwargs):
progress.update(task_id, advance=1, refresh=True)
return func(**kwargs)
def pl_groupby_progress_apply(data, group_by, func, drop_cols=[], **kwargs):
global progress
with Progress() as progress:
num_groups = len(data.select(group_by).unique())
task_id = progress.add_task("Applying", total=num_groups)
return (
data
.groupby(group_by)
.apply(lambda x: pl_progress_applier(
x=x.drop(drop_cols), func=func, task_id=task_id, progress=progress, **kwargs)
)
)
and using the function custom_func, we can return a table, however the progress bar jumps to 100%
def custom_func(x):
return x.select(pl.col('points').mean())
pl_groupby_progress_apply(
data=df,
group_by='team',
func=custom_func
)
Any ideas on how to get the progress bar to actually work?
Edit 2:
It seems like the above functions do indeed work, however if you're using PyCharm (like me), then it does not work. Enjoy non-PyCharm users!"
英文:
Is it possible to add a progress bar to a Polars apply loop with a custom function?
For example, how would I add a progress bar to the following toy example:
import polars as pl
df = pl.DataFrame(
{
"team": ["A", "A", "A", "B", "B", "C"],
"conference": ["East", "East", "East", "West", "West", "East"],
"points": [11, 8, 10, 6, 6, 5],
"rebounds": [7, 7, 6, 9, 12, 8]
}
)
df.groupby('team').apply(lambda x: x.select(pl.col('points').mean()))
Edit 1:
After help from @Jcurious, I have the following 'tools' that can be re-used for other functions, however it does not print to console correctly.
def pl_progress_applier(func, task_id, progress, **kwargs):
progress.update(task_id, advance=1, refresh=True)
return func(**kwargs)
def pl_groupby_progress_apply(data, group_by, func, drop_cols=[], **kwargs):
global progress
with Progress() as progress:
num_groups = len(data.select(group_by).unique())
task_id = progress.add_task("Applying", total=num_groups)
return (
data
.groupby(group_by)
.apply(lambda x: pl_progress_applier(
x=x.drop(drop_cols), func=func, task_id=task_id, progress=progress, **kwargs)
)
)
# and using the function custom_func, we can return a table, howevef the progress bar jumps to 100%
def custom_func(x):
return x.select(pl.col('points').mean())
pl_groupby_progress_apply(
data=df,
group_by='team',
func=custom_func
)
Any ideas on how to get the progress bar to actually work?
Edit 2:
It seems like the above functions do indeed work, however if you're using PyCharm (like me), then it does not work. Enjoy non-PyCharm users!
答案1
得分: 4
progress.update()
可以手动更新进度条。
from pip._vendor.rich.progress import Progress
def my_custom_function(group):
progress.update(task_id, advance=1)
return group.select(pl.col('points').mean())
with Progress() as progress:
num_groups = df.get_column("team").unique().len()
task_id = progress.add_task("Applying", total=num_groups)
df.groupby('team').apply(my_custom_function)
虽然也许你应该分享一下你实际在做什么,因为.groupby.apply()
会比较慢 - 可能有更好的方法。
英文:
You could use rich.progress which also comes bundled with pip.
progress.update()
can manually update a progress bar.
from pip._vendor.rich.progress import Progress
def my_custom_function(group):
progress.update(task_id, advance=1)
return group.select(pl.col('points').mean())
with Progress() as progress:
num_groups = df.get_column("team").unique().len()
task_id = progress.add_task("Applying", total=num_groups)
df.groupby('team').apply(my_custom_function)
Although perhaps you should share what you're actually doing as .groupby.apply()
is going to be "slow" - there may be a better way.
答案2
得分: 2
我发现的最好解决方案是 tqdm。我们希望一个解决方案
1. 使我们保持在 polars 编码风格中。
2. 通用
要做到这一点,我们只需定义这个函数:
import polars as pl
from tqdm import tqdm
def w_pbar(pbar, func):
def foo(*args, **kwargs):
pbar.update(1)
return func(*args, **kwargs)
return foo
现在,我们可以采用您的原始代码,生成 pbar,并在适当的位置添加 'w_pbar':
df = pl.DataFrame(
{
"team": ["A", "A", "A", "B", "B", "C"],
"conference": ["East", "East", "East", "West", "West", "East"],
"points": [11, 8, 10, 6, 6, 5],
"rebounds": [7, 7, 6, 9, 12, 8]
}
)
num_groups = df.get_column("team").unique().len()
with tqdm(total=num_groups) as pbar:
res = df.groupby('team').apply(w_pbar(pbar, lambda x: x.select(pl.col('points').mean())))
您可以使用任何您想要的设置生成 pbar(tqdm 对象)。并将 w_pbar 添加到任何 'apply' 的用法中。
顺便说一下,它也适用于没有 'groupby' 的 'apply':
pbar = tqdm(total=len(df), desc='adding 1 to points', colour='green')
df1 = df.with_columns(pl.col('points').apply(w_pbar(pbar, lambda x: x + 1)).alias('points+1'))
pbar.close()
英文:
The best solution I found is tqdm. We want a solution that
- enable us stay in the polars coding style.
- General
To do so, all we have to define is this function:
import polars as pl
from tqdm import tqdm
def w_pbar(pbar, func):
def foo(*args, **kwargs):
pbar.update(1)
return func(*args, **kwargs)
return foo
Now, we could take your original code, generate pbar and add 'w_pbar' in the appropriate place:
df = pl.DataFrame(
{
"team": ["A", "A", "A", "B", "B", "C"],
"conference": ["East", "East", "East", "West", "West", "East"],
"points": [11, 8, 10, 6, 6, 5],
"rebounds": [7, 7, 6, 9, 12, 8]
}
)
num_groups = df.get_column("team").unique().len()
with tqdm(total=num_groups) as pbar:
res = df.groupby('team').apply(w_pbar(pbar, lambda x: x.select(pl.col('points').mean())))
You can generate pbar (the tqdm object) with every setting you want. And add w_pbar to any usage of 'apply'.
bty, it also works for 'apply' without 'groupby':
pbar = tqdm(total=len(df), desc='adding 1 to points', colour='green')
df1 = df.with_columns(pl.col('points').apply(w_pbar(pbar, lambda x: x + 1)).alias('points+1'))
pbar.close()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论