英文:
Python-polars: from coefficients, values and (nested) lists to weighted values
问题
以下是您要翻译的内容:
"Let's say I've got a Polars DataFrame similar to this one:
import polars as pl
from decimal import Decimal
pl.Config.activate_decimals()
pl.from_dicts(
[
{"id": 1, "value": Decimal("100.0"), "items": ["A"]},
{"id": 2, "value": Decimal("150.000"), "items": ["A", "B"]},
{"id": 3, "value": Decimal("70.0000"), "items": ["A", "B", "C"]},
]
)
id → pl.Int64 | value → pl.Decimal | items → pl.List(str) |
---|---|---|
1 | 100 | ["A"] |
2 | 150 | ["A", "B"] |
3 | 70 | ["A", "B", "C"] |
And the following Python dictionary:
coef = {"A": Decimal("0.2"), "B": Decimal("0.35"), "C": Decimal("0.45") }
From here, how to get the following DataFrame in an efficient manner using Polars?
id → pl.Int64 | value → pl.Decimal | item → str |
---|---|---|
1 | 100 | "A" |
2 | 54.5454545454545454 | "A" |
2 | 95.4545454545454545 | "B" |
3 | 14.00000 | "A" |
3 | 24.500000 | "B" |
3 | 31.500000 | "C" |
In this example, 54.5454...
for instance corresponds to 0.2 / (0.2 + 0.35) * 150
.
I've tried various things using .explode
(doc), .map_dict
(doc), .when
/ .then
/ .otherwise
/ .cast
with .arr.lengths
(doc), but I'm still struggling to get the expected output in a clean way.
Note that the length of the coef
dictionary isn't always the same. It basically looks like {key1: coef1, key2: coef2, ... }
with n
keys and n
values (whose sum is equal to Decimal(1)
). Also, I'm working with Decimal
(not float64
values), even if this is an "experimental work-in-progress feature" (see here). I'm using polars '0.17.2'
. "
英文:
Let's say I've got a Polars DataFrame similar to this one:
import polars as pl
from decimal import Decimal
pl.Config.activate_decimals()
pl.from_dicts(
[
{"id": 1, "value": Decimal("100.0"), "items": ["A"]},
{"id": 2, "value": Decimal("150.000"), "items": ["A", "B"]},
{"id": 3, "value": Decimal("70.0000"), "items": ["A", "B", "C"]},
]
)
id → pl.Int64 | value → pl.Decimal | items → pl.List(str) |
---|---|---|
1 | 100 | ["A"] |
2 | 150 | ["A", "B"] |
3 | 70 | ["A", "B", "C"] |
And the following Python dictionary:
coef = {"A": Decimal("0.2"), "B": Decimal("0.35"), "C": Decimal("0.45") }
From here, how to get the following DataFrame in an efficient manner using Polars?
id → pl.Int64 | value → pl.Decimal | item → str |
---|---|---|
1 | 100 | "A" |
2 | 54.5454545454545454 | "A" |
2 | 95.4545454545454545 | "B" |
3 | 14.00000 | "A" |
3 | 24.500000 | "B" |
3 | 31.500000 | "C" |
In this example, 54.5454...
for instance corresponds to 0.2 / (0.2 + 0.35) * 150
.
I've tried various things using .explode
(doc), .map_dict
(doc), .when
/ .then
/ .otherwise
/ .cast
with .arr.lengths
(doc), but I'm still struggling to get the expected output in a clean way.
Note that the length of the coef
dictionary isn't always the same. It basically looks like {key1: coef1, key2: coef2, ... }
with n
keys and n
values (whose sum is equal to Decimal(1)
). Also, I'm working with Decimal
(not float64
values), even if this is an "experimental work-in-progress feature" (see here). I'm using polars '0.17.2'
.
答案1
得分: 1
One way to write it could be:
df.explode("items").with_columns(
coef = pl.col("items").map_dict(coef)
).with_columns(
pl.col("coef") / pl.col("coef").sum().over("id") * pl.col("value")
)
shape: (6, 4)
┌─────┬──────────┬───────┬───────────────┐
│ id ┆ value ┆ items ┆ coef │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞═════╪══════════╪═══════╪═══════════════╡
│ 1 ┆ 1e6 ┆ A ┆ 1e6 │
│ 2 ┆ 1.5e6 ┆ A ┆ 545454.545455 │
│ 2 ┆ 1.5e6 ┆ B ┆ 954545.454545 │
│ 3 ┆ 700000.0 ┆ A ┆ 140000.0 │
│ 3 ┆ 700000.0 ┆ B ┆ 245000.0 │
│ 3 ┆ 700000.0 ┆ C ┆ 315000.0 │
└─────┴──────────┴───────┴───────────────┘
However, with Decimals, it raises an exception:
thread '<unnamed>' panicked at 'not implemented for Decimal(None, Some(2))',
./git/polars/polars/polars-lazy/src/physical_plan/expressions/window.rs:685:9
PanicException: not implemented for Decimal(None, Some(2))
英文:
One way to write it could be:
df.explode("items").with_columns(
coef = pl.col("items").map_dict(coef)
).with_columns(
pl.col("coef") / pl.col("coef").sum().over("id") * pl.col("value")
)
shape: (6, 4)
┌─────┬──────────┬───────┬───────────────┐
│ id ┆ value ┆ items ┆ coef │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞═════╪══════════╪═══════╪═══════════════╡
│ 1 ┆ 1e6 ┆ A ┆ 1e6 │
│ 2 ┆ 1.5e6 ┆ A ┆ 545454.545455 │
│ 2 ┆ 1.5e6 ┆ B ┆ 954545.454545 │
│ 3 ┆ 700000.0 ┆ A ┆ 140000.0 │
│ 3 ┆ 700000.0 ┆ B ┆ 245000.0 │
│ 3 ┆ 700000.0 ┆ C ┆ 315000.0 │
└─────┴──────────┴───────┴───────────────┘
However, with Decimals, it raises an exception:
thread '<unnamed>' panicked at 'not implemented for Decimal(None, Some(2))',
./git/polars/polars/polars-lazy/src/physical_plan/expressions/window.rs:685:9
PanicException: not implemented for Decimal(None, Some(2))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论