将列值连接为一个串。

huangapple go评论65阅读模式
英文:

python-polars Join Column Values into a concatenated string

问题

I am trying to write an aggregation routine where values in columns are concatenated based on a groupby statement.

我正在尝试编写一个汇总例程,根据groupby语句将列中的值连接起来。

I am trying to call a custom function to do the aggregation, and also trying to avoid using lambda (my understanding is – lambda functions only run in serial, hence performance would be slower).

我尝试调用自定义函数来进行聚合,并尝试避免使用lambda(我理解的是lambda函数只能串行运行,因此性能会较慢)。

Here is my code:

这是我的代码:

def agg_ll_field(col_name) -> pl.Expr:
        return ';;'.join(pl.col(col_name).drop_nulls().unique().sort())
   
dfa = df.lazy()\
    .groupby(by=['SharedSourceSystem', 'FOPortfolioName']).agg(
    [
        , agg_ll_field('BookingUnits').alias('BOOKG_UNIT')
    ]).collect()

I keep on getting an error:

我一直在遇到一个错误:

agg_ll_field: Unexpected:  can only join an iterable   <class 'TypeError'>

Would anyone be able to help resolve this?

有人能帮助解决这个问题吗?

Thank you!

谢谢!

英文:

I am trying to write an aggregation routine where values in columns are concatenated based on a groupby statement.

I am trying to call a custom function to do the aggregation, and also trying to avoid using lambda (my understanding is – lambda functions only run in serial, hence performance would be slower). Here is my code:

def agg_ll_field(col_name) -&gt; pl.Expr:
        return &#39;;&#39;.join(pl.col(col_name).drop_nulls().unique().sort())
   
dfa = df.lazy()\
    .groupby(by=[&#39;SharedSourceSystem&#39;, &#39;FOPortfolioName&#39;]).agg(
    [
        , agg_ll_field(&#39;BookingUnits&#39;).alias(&#39;BOOKG_UNIT&#39;)
    ]).collect()

I keep on getting an error:

agg_ll_field: Unexpected:  can only join an iterable   &lt;class &#39;TypeError&#39;&gt;

Would anyone be able to help resolve this?
Thank you!

I tried using apply function instead - that seems to work but I'm trying to avoid apply, since performance is supposed to be worse.

答案1

得分: 1

这是使用str.concat的完整示例:

# 创建一个样本 DataFrame
data = {
    'SharedSourceSystem': ['A', 'A', 'B', 'B', 'B'],
    'FOPortfolioName': ['X', 'X', 'Y', 'Y', 'Y'],
    'BookingUnits': [1, 2, 2, 2, 3]
}

df = pl.DataFrame(data)

# 定义自定义聚合函数
def agg_ll_field(col_name) -> pl.Expr:
    return pl.col(col_name).drop_nulls().unique().sort().str.concat(';')

# 应用惰性分组和聚合
dfa = df.lazy()\
    .groupby(by=['SharedSourceSystem', 'FOPortfolioName']).agg(
    [
        agg_ll_field('BookingUnits').alias('BOOKG_UNIT')
    ]).collect()

# 输出

┌────────────────────┬─────────────────┬────────────┐
 SharedSourceSystem  FOPortfolioName  BOOKG_UNIT 
 ---                 ---              ---        
 str                 str              str        
╞════════════════════╪═════════════════╪════════════╡
 A                   X                1;2        
 B                   Y                2;3        
└────────────────────┴─────────────────┴────────────┘
英文:

Here is the full example using str.concat:

# Create a sample DataFrame
data = {
    &#39;SharedSourceSystem&#39;: [&#39;A&#39;, &#39;A&#39;, &#39;B&#39;, &#39;B&#39;, &#39;B&#39;],
    &#39;FOPortfolioName&#39;: [&#39;X&#39;, &#39;X&#39;, &#39;Y&#39;, &#39;Y&#39;, &#39;Y&#39;],
    &#39;BookingUnits&#39;: [1, 2, 2, 2, 3]
}

df = pl.DataFrame(data)

# Define the custom aggregation function
def agg_ll_field(col_name) -&gt; pl.Expr:
    return pl.col(col_name).drop_nulls().unique().sort().str.concat(&#39;;&#39;)

# Apply the lazy groupby and aggregation
dfa = df.lazy()\
    .groupby(by=[&#39;SharedSourceSystem&#39;, &#39;FOPortfolioName&#39;]).agg(
    [
        agg_ll_field(&#39;BookingUnits&#39;).alias(&#39;BOOKG_UNIT&#39;)
    ]).collect()

# Output

┌────────────────────┬─────────────────┬────────────┐
 SharedSourceSystem  FOPortfolioName  BOOKG_UNIT 
 ---                 ---              ---        
 str                 str              str        
╞════════════════════╪═════════════════╪════════════╡
 A                   X                1;2        
 B                   Y                2;3        
└────────────────────┴─────────────────┴────────────┘


</details>



huangapple
  • 本文由 发表于 2023年4月11日 01:01:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/75979059.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定