2023年2月24日 04:43:26go评论154阅读模式

英文:

Python Polars: How to add a progress bars to apply loops

问题

以下是您要翻译的内容：

"Is it possible to add a progress bar to a Polars apply loop with a custom function?

For example, how would I add a progress bar to the following toy example:

import polars as pl

df = pl.DataFrame(
{
"team": ["A", "A", "A", "B", "B", "C"],
"conference": ["East", "East", "East", "West", "West", "East"],
"points": [11, 8, 10, 6, 6, 5],
"rebounds": [7, 7, 6, 9, 12, 8]
}
)

df.groupby('team').apply(lambda x: x.select(pl.col('points').mean()))

Edit 1:

After help from @Jcurious, I have the following 'tools' that can be re-used for other functions, however it does not print to console correctly.

def pl_progress_applier(func, task_id, progress, **kwargs):
progress.update(task_id, advance=1, refresh=True)
return func(**kwargs)

def pl_groupby_progress_apply(data, group_by, func, drop_cols=[], **kwargs):

global progress
with Progress() as progress:
num_groups = len(data.select(group_by).unique())
task_id = progress.add_task("Applying", total=num_groups)
return (
data
.groupby(group_by)
.apply(lambda x: pl_progress_applier(
x=x.drop(drop_cols), func=func, task_id=task_id, progress=progress, **kwargs)
)
)

and using the function custom_func, we can return a table, however the progress bar jumps to 100%

def custom_func(x):
return x.select(pl.col('points').mean())

pl_groupby_progress_apply(
data=df,
group_by='team',
func=custom_func
)

Any ideas on how to get the progress bar to actually work?

Edit 2:

It seems like the above functions do indeed work, however if you're using PyCharm (like me), then it does not work. Enjoy non-PyCharm users!"

英文:

Is it possible to add a progress bar to a Polars apply loop with a custom function?

For example, how would I add a progress bar to the following toy example:

        import polars as pl
        df = pl.DataFrame(
            {
                &quot;team&quot;: [&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;C&quot;],
                &quot;conference&quot;: [&quot;East&quot;, &quot;East&quot;, &quot;East&quot;, &quot;West&quot;, &quot;West&quot;, &quot;East&quot;],
                &quot;points&quot;: [11, 8, 10, 6, 6, 5],
                &quot;rebounds&quot;: [7, 7, 6, 9, 12, 8]
            }
        )
        df.groupby(&#39;team&#39;).apply(lambda x: x.select(pl.col(&#39;points&#39;).mean()))

Edit 1:

After help from @Jcurious, I have the following 'tools' that can be re-used for other functions, however it does not print to console correctly.

        def pl_progress_applier(func, task_id, progress, **kwargs):
            progress.update(task_id, advance=1, refresh=True)
            return func(**kwargs)
        def pl_groupby_progress_apply(data, group_by, func, drop_cols=[], **kwargs):
            global progress
            with Progress() as progress:
                num_groups = len(data.select(group_by).unique())
                task_id = progress.add_task(&quot;Applying&quot;, total=num_groups)
                return (
                    data
                        .groupby(group_by)
                        .apply(lambda x: pl_progress_applier(
                            x=x.drop(drop_cols), func=func, task_id=task_id, progress=progress, **kwargs)
                        )
                )
        # and using the function custom_func, we can return a table, howevef the progress bar jumps to 100%
        def custom_func(x):
            return x.select(pl.col(&#39;points&#39;).mean())
        pl_groupby_progress_apply(
            data=df,
            group_by=&#39;team&#39;,
            func=custom_func
        )

Any ideas on how to get the progress bar to actually work?

Edit 2:

It seems like the above functions do indeed work, however if you're using PyCharm (like me), then it does not work. Enjoy non-PyCharm users!

答案1

得分: 4

progress.update() 可以手动更新进度条。

from pip._vendor.rich.progress import Progress
def my_custom_function(group):
    progress.update(task_id, advance=1)
    return group.select(pl.col('points').mean())
   
with Progress() as progress:     
    num_groups = df.get_column("team").unique().len()
    task_id = progress.add_task("Applying", total=num_groups)
    
    df.groupby('team').apply(my_custom_function)

虽然也许你应该分享一下你实际在做什么，因为.groupby.apply() 会比较慢 - 可能有更好的方法。

英文:

You could use rich.progress which also comes bundled with pip.

progress.update() can manually update a progress bar.

from pip._vendor.rich.progress import Progress
def my_custom_function(group):
    progress.update(task_id, advance=1)
    return group.select(pl.col(&#39;points&#39;).mean())
   
with Progress() as progress:     
    num_groups = df.get_column(&quot;team&quot;).unique().len()
    task_id = progress.add_task(&quot;Applying&quot;, total=num_groups)
    
    df.groupby(&#39;team&#39;).apply(my_custom_function)

Although perhaps you should share what you're actually doing as .groupby.apply() is going to be "slow" - there may be a better way.

答案2

得分: 2

我发现的最好解决方案是 tqdm。我们希望一个解决方案
 1. 使我们保持在 polars 编码风格中。
 2. 通用
要做到这一点，我们只需定义这个函数：
    import polars as pl
    from tqdm import tqdm
    
    def w_pbar(pbar, func):
        def foo(*args, **kwargs):
            pbar.update(1)
            return func(*args, **kwargs)
    
        return foo
现在，我们可以采用您的原始代码，生成 pbar，并在适当的位置添加 'w_pbar'：
    df = pl.DataFrame(
        {
            "team": ["A", "A", "A", "B", "B", "C"],
            "conference": ["East", "East", "East", "West", "West", "East"],
            "points": [11, 8, 10, 6, 6, 5],
            "rebounds": [7, 7, 6, 9, 12, 8]
        }
    )
    num_groups = df.get_column("team").unique().len()
    with tqdm(total=num_groups) as pbar:
        res = df.groupby('team').apply(w_pbar(pbar, lambda x: x.select(pl.col('points').mean())))
您可以使用任何您想要的设置生成 pbar（tqdm 对象）。并将 w_pbar 添加到任何 'apply' 的用法中。
顺便说一下，它也适用于没有 'groupby' 的 'apply'：
    pbar = tqdm(total=len(df), desc='adding 1 to points', colour='green')
    df1 = df.with_columns(pl.col('points').apply(w_pbar(pbar, lambda x: x + 1)).alias('points+1'))
    pbar.close()

英文:

The best solution I found is tqdm. We want a solution that

enable us stay in the polars coding style.
General

To do so, all we have to define is this function:

import polars as pl
from tqdm import tqdm
def w_pbar(pbar, func):
    def foo(*args, **kwargs):
        pbar.update(1)
        return func(*args, **kwargs)
    return foo

Now, we could take your original code, generate pbar and add 'w_pbar' in the appropriate place:

df = pl.DataFrame(
    {
        &quot;team&quot;: [&quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;B&quot;, &quot;B&quot;, &quot;C&quot;],
        &quot;conference&quot;: [&quot;East&quot;, &quot;East&quot;, &quot;East&quot;, &quot;West&quot;, &quot;West&quot;, &quot;East&quot;],
        &quot;points&quot;: [11, 8, 10, 6, 6, 5],
        &quot;rebounds&quot;: [7, 7, 6, 9, 12, 8]
    }
)
num_groups = df.get_column(&quot;team&quot;).unique().len()
with tqdm(total=num_groups) as pbar:
    res = df.groupby(&#39;team&#39;).apply(w_pbar(pbar, lambda x: x.select(pl.col(&#39;points&#39;).mean())))

You can generate pbar (the tqdm object) with every setting you want. And add w_pbar to any usage of 'apply'.

bty, it also works for 'apply' without 'groupby':

pbar = tqdm(total=len(df), desc=&#39;adding 1 to points&#39;, colour=&#39;green&#39;)
df1 = df.with_columns(pl.col(&#39;points&#39;).apply(w_pbar(pbar, lambda x: x + 1)).alias(&#39;points+1&#39;))
pbar.close()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python Polars：如何在应用循环中添加进度条

问题

and using the function custom_func, we can return a table, however the progress bar jumps to 100%

答案1

答案2

OpenGL为什么无法加载纹理（通用图像）？

在分段式色标中绘制不同颜色之间的分隔线。

Create a Python list with every combination of ‘+’, ‘-‘, ‘*’, and ‘/’ strings.

我的代码中的列表组件和超链接不起作用。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。