Python-polars:从系数、数值和(嵌套的)列表转换为加权数值

huangapple go评论74阅读模式
英文:

Python-polars: from coefficients, values and (nested) lists to weighted values

问题

以下是您要翻译的内容:

"Let's say I've got a Polars DataFrame similar to this one:

import polars as pl
from decimal import Decimal

pl.Config.activate_decimals()

pl.from_dicts(
    [
        {"id": 1, "value": Decimal("100.0"), "items": ["A"]},
        {"id": 2, "value": Decimal("150.000"), "items": ["A", "B"]},
        {"id": 3, "value": Decimal("70.0000"), "items": ["A", "B", "C"]},
    ]
)
id → pl.Int64 value → pl.Decimal items → pl.List(str)
1 100 ["A"]
2 150 ["A", "B"]
3 70 ["A", "B", "C"]

And the following Python dictionary:

coef = {"A": Decimal("0.2"), "B": Decimal("0.35"), "C": Decimal("0.45") }

From here, how to get the following DataFrame in an efficient manner using Polars?

id → pl.Int64 value → pl.Decimal item → str
1 100 "A"
2 54.5454545454545454 "A"
2 95.4545454545454545 "B"
3 14.00000 "A"
3 24.500000 "B"
3 31.500000 "C"

In this example, 54.5454... for instance corresponds to 0.2 / (0.2 + 0.35) * 150.

I've tried various things using .explode (doc), .map_dict (doc), .when / .then / .otherwise / .cast with .arr.lengths (doc), but I'm still struggling to get the expected output in a clean way.

Note that the length of the coef dictionary isn't always the same. It basically looks like {key1: coef1, key2: coef2, ... } with n keys and n values (whose sum is equal to Decimal(1)). Also, I'm working with Decimal (not float64 values), even if this is an "experimental work-in-progress feature" (see here). I'm using polars '0.17.2'. "

英文:

Let's say I've got a Polars DataFrame similar to this one:

import polars as pl
from decimal import Decimal

pl.Config.activate_decimals()

pl.from_dicts(
    [
        {"id": 1, "value": Decimal("100.0"), "items": ["A"]},
        {"id": 2, "value": Decimal("150.000"), "items": ["A", "B"]},
        {"id": 3, "value": Decimal("70.0000"), "items": ["A", "B", "C"]},
    ]
)
id → pl.Int64 value → pl.Decimal items → pl.List(str)
1 100 ["A"]
2 150 ["A", "B"]
3 70 ["A", "B", "C"]

And the following Python dictionary:

coef = {"A": Decimal("0.2"), "B": Decimal("0.35"), "C": Decimal("0.45") }

From here, how to get the following DataFrame in an efficient manner using Polars?

id → pl.Int64 value → pl.Decimal item → str
1 100 "A"
2 54.5454545454545454 "A"
2 95.4545454545454545 "B"
3 14.00000 "A"
3 24.500000 "B"
3 31.500000 "C"

In this example, 54.5454... for instance corresponds to 0.2 / (0.2 + 0.35) * 150.

I've tried various things using .explode (doc), .map_dict (doc), .when / .then / .otherwise / .cast with .arr.lengths (doc), but I'm still struggling to get the expected output in a clean way.

Note that the length of the coef dictionary isn't always the same. It basically looks like {key1: coef1, key2: coef2, ... } with n keys and n values (whose sum is equal to Decimal(1)). Also, I'm working with Decimal (not float64 values), even if this is an "experimental work-in-progress feature" (see here). I'm using polars '0.17.2'.

答案1

得分: 1

One way to write it could be:

df.explode("items").with_columns(
   coef = pl.col("items").map_dict(coef)
).with_columns(
   pl.col("coef") / pl.col("coef").sum().over("id") * pl.col("value")
)

shape: (6, 4)
┌─────┬──────────┬───────┬───────────────┐
│ id ┆ value ┆ items ┆ coef │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞═════╪══════════╪═══════╪═══════════════╡
│ 1 ┆ 1e6 ┆ A ┆ 1e6 │
│ 2 ┆ 1.5e6 ┆ A ┆ 545454.545455 │
│ 2 ┆ 1.5e6 ┆ B ┆ 954545.454545 │
│ 3 ┆ 700000.0 ┆ A ┆ 140000.0 │
│ 3 ┆ 700000.0 ┆ B ┆ 245000.0 │
│ 3 ┆ 700000.0 ┆ C ┆ 315000.0 │
└─────┴──────────┴───────┴───────────────┘

However, with Decimals, it raises an exception:

thread '<unnamed>' panicked at 'not implemented for Decimal(None, Some(2))', 
./git/polars/polars/polars-lazy/src/physical_plan/expressions/window.rs:685:9
PanicException: not implemented for Decimal(None, Some(2))
英文:

One way to write it could be:

df.explode(&quot;items&quot;).with_columns(
   coef = pl.col(&quot;items&quot;).map_dict(coef)
).with_columns(
   pl.col(&quot;coef&quot;) / pl.col(&quot;coef&quot;).sum().over(&quot;id&quot;) * pl.col(&quot;value&quot;)
)
shape: (6, 4)
┌─────┬──────────┬───────┬───────────────┐
│ id  ┆ value    ┆ items ┆ coef          │
│ --- ┆ ---      ┆ ---   ┆ ---           │
│ i64 ┆ f64      ┆ str   ┆ f64           │
╞═════╪══════════╪═══════╪═══════════════╡
│ 1   ┆ 1e6      ┆ A     ┆ 1e6           │
│ 2   ┆ 1.5e6    ┆ A     ┆ 545454.545455 │
│ 2   ┆ 1.5e6    ┆ B     ┆ 954545.454545 │
│ 3   ┆ 700000.0 ┆ A     ┆ 140000.0      │
│ 3   ┆ 700000.0 ┆ B     ┆ 245000.0      │
│ 3   ┆ 700000.0 ┆ C     ┆ 315000.0      │
└─────┴──────────┴───────┴───────────────┘

However, with Decimals, it raises an exception:

thread &#39;&lt;unnamed&gt;&#39; panicked at &#39;not implemented for Decimal(None, Some(2))&#39;, 
./git/polars/polars/polars-lazy/src/physical_plan/expressions/window.rs:685:9
PanicException: not implemented for Decimal(None, Some(2))

huangapple
  • 本文由 发表于 2023年4月13日 18:45:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76004515.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定