2023年4月11日 01:01:02go评论68阅读模式

英文:

python-polars Join Column Values into a concatenated string

问题

I am trying to write an aggregation routine where values in columns are concatenated based on a groupby statement.

我正在尝试编写一个汇总例程，根据groupby语句将列中的值连接起来。

I am trying to call a custom function to do the aggregation, and also trying to avoid using lambda (my understanding is – lambda functions only run in serial, hence performance would be slower).

我尝试调用自定义函数来进行聚合，并尝试避免使用lambda（我理解的是lambda函数只能串行运行，因此性能会较慢）。

Here is my code:

这是我的代码：

def agg_ll_field(col_name) -> pl.Expr:
        return ';;'.join(pl.col(col_name).drop_nulls().unique().sort())
   
dfa = df.lazy()\
    .groupby(by=['SharedSourceSystem', 'FOPortfolioName']).agg(
    [
        , agg_ll_field('BookingUnits').alias('BOOKG_UNIT')
    ]).collect()

I keep on getting an error:

我一直在遇到一个错误：

agg_ll_field: Unexpected:  can only join an iterable   <class 'TypeError'>

Would anyone be able to help resolve this?

有人能帮助解决这个问题吗？

Thank you!

谢谢！

英文:

I am trying to write an aggregation routine where values in columns are concatenated based on a groupby statement.

I am trying to call a custom function to do the aggregation, and also trying to avoid using lambda (my understanding is – lambda functions only run in serial, hence performance would be slower). Here is my code:

def agg_ll_field(col_name) -&gt; pl.Expr:
        return &#39;;&#39;.join(pl.col(col_name).drop_nulls().unique().sort())
   
dfa = df.lazy()\
    .groupby(by=[&#39;SharedSourceSystem&#39;, &#39;FOPortfolioName&#39;]).agg(
    [
        , agg_ll_field(&#39;BookingUnits&#39;).alias(&#39;BOOKG_UNIT&#39;)
    ]).collect()

I keep on getting an error:

agg_ll_field: Unexpected:  can only join an iterable   &lt;class &#39;TypeError&#39;&gt;

Would anyone be able to help resolve this?
Thank you!

I tried using apply function instead - that seems to work but I'm trying to avoid apply, since performance is supposed to be worse.

答案1

得分: 1

这是使用str.concat的完整示例：

# 创建一个样本 DataFrame
data = {
    'SharedSourceSystem': ['A', 'A', 'B', 'B', 'B'],
    'FOPortfolioName': ['X', 'X', 'Y', 'Y', 'Y'],
    'BookingUnits': [1, 2, 2, 2, 3]
}

df = pl.DataFrame(data)

# 定义自定义聚合函数
def agg_ll_field(col_name) -> pl.Expr:
    return pl.col(col_name).drop_nulls().unique().sort().str.concat(';')

# 应用惰性分组和聚合
dfa = df.lazy()\
    .groupby(by=['SharedSourceSystem', 'FOPortfolioName']).agg(
    [
        agg_ll_field('BookingUnits').alias('BOOKG_UNIT')
    ]).collect()

# 输出

┌────────────────────┬─────────────────┬────────────┐
│ SharedSourceSystem ┆ FOPortfolioName ┆ BOOKG_UNIT │
│ ---                ┆ ---             ┆ ---        │
│ str                ┆ str             ┆ str        │
╞════════════════════╪═════════════════╪════════════╡
│ A                  ┆ X               ┆ 1;2        │
│ B                  ┆ Y               ┆ 2;3        │
└────────────────────┴─────────────────┴────────────┘

英文:

Here is the full example using str.concat:

# Create a sample DataFrame
data = {
    &#39;SharedSourceSystem&#39;: [&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;],
    &#39;FOPortfolioName&#39;: [&#39;X&#39;, &#39;X&#39;, &#39;Y&#39;, &#39;Y&#39;, &#39;Y&#39;],
    &#39;BookingUnits&#39;: [1, 2, 2, 2, 3]
}

df = pl.DataFrame(data)

# Define the custom aggregation function
def agg_ll_field(col_name) -&gt; pl.Expr:
    return pl.col(col_name).drop_nulls().unique().sort().str.concat(&#39;;&#39;)

# Apply the lazy groupby and aggregation
dfa = df.lazy()\
    .groupby(by=[&#39;SharedSourceSystem&#39;, &#39;FOPortfolioName&#39;]).agg(
    [
        agg_ll_field(&#39;BookingUnits&#39;).alias(&#39;BOOKG_UNIT&#39;)
    ]).collect()

# Output

┌────────────────────┬─────────────────┬────────────┐
│ SharedSourceSystem ┆ FOPortfolioName ┆ BOOKG_UNIT │
│ ---                ┆ ---             ┆ ---        │
│ str                ┆ str             ┆ str        │
╞════════════════════╪═════════════════╪════════════╡
│ A                  ┆ X               ┆ 1;2        │
│ B                  ┆ Y               ┆ 2;3        │
└────────────────────┴─────────────────┴────────────┘


</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将列值连接为一个串。

问题

答案1

如何在保留现有参数的情况下更改 torch.nn.Linear 的输出大小？

将pandas数据框减少为一个具有重复值列表的列。

折线图未显示

4D绘图，用单一颜色描述第四维，并用线连接。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论