2023年4月13日 18:45:12go评论101阅读模式

英文:

Python-polars: from coefficients, values and (nested) lists to weighted values

问题

以下是您要翻译的内容：

"Let's say I've got a Polars DataFrame similar to this one:

import polars as pl
from decimal import Decimal
pl.Config.activate_decimals()
pl.from_dicts(
    [
        {"id": 1, "value": Decimal("100.0"), "items": ["A"]},
        {"id": 2, "value": Decimal("150.000"), "items": ["A", "B"]},
        {"id": 3, "value": Decimal("70.0000"), "items": ["A", "B", "C"]},
    ]
)

id → pl.Int64	value → pl.Decimal	items → pl.List(str)
1	100	`["A"]`
2	150	`["A", "B"]`
3	70	`["A", "B", "C"]`

And the following Python dictionary:

coef = {"A": Decimal("0.2"), "B": Decimal("0.35"), "C": Decimal("0.45") }

From here, how to get the following DataFrame in an efficient manner using Polars?

id → pl.Int64	value → pl.Decimal	item → str
1	100	"A"
2	54.5454545454545454	"A"
2	95.4545454545454545	"B"
3	14.00000	"A"
3	24.500000	"B"
3	31.500000	"C"

In this example, 54.5454... for instance corresponds to 0.2 / (0.2 + 0.35) * 150.

I've tried various things using .explode (doc), .map_dict (doc), .when / .then / .otherwise / .cast with .arr.lengths (doc), but I'm still struggling to get the expected output in a clean way.

Note that the length of the coef dictionary isn't always the same. It basically looks like {key1: coef1, key2: coef2, ... } with n keys and n values (whose sum is equal to Decimal(1)). Also, I'm working with Decimal (not float64 values), even if this is an "experimental work-in-progress feature" (see here). I'm using polars '0.17.2'. "

英文:

Let's say I've got a Polars DataFrame similar to this one:

import polars as pl
from decimal import Decimal
pl.Config.activate_decimals()
pl.from_dicts(
    [
        {&quot;id&quot;: 1, &quot;value&quot;: Decimal(&quot;100.0&quot;), &quot;items&quot;: [&quot;A&quot;]},
        {&quot;id&quot;: 2, &quot;value&quot;: Decimal(&quot;150.000&quot;), &quot;items&quot;: [&quot;A&quot;, &quot;B&quot;]},
        {&quot;id&quot;: 3, &quot;value&quot;: Decimal(&quot;70.0000&quot;), &quot;items&quot;: [&quot;A&quot;, &quot;B&quot;, &quot;C&quot;]},
    ]
)

id → pl.Int64	value → pl.Decimal	items → pl.List(str)
1	100	`["A"]`
2	150	`["A", "B"]`
3	70	`["A", "B", "C"]`

And the following Python dictionary:

coef = {&quot;A&quot;: Decimal(&quot;0.2&quot;), &quot;B&quot;: Decimal(&quot;0.35&quot;), &quot;C&quot;: Decimal(&quot;0.45&quot;) }

From here, how to get the following DataFrame in an efficient manner using Polars?

id → pl.Int64	value → pl.Decimal	item → str
1	100	"A"
2	54.5454545454545454	"A"
2	95.4545454545454545	"B"
3	14.00000	"A"
3	24.500000	"B"
3	31.500000	"C"

In this example, 54.5454... for instance corresponds to 0.2 / (0.2 + 0.35) * 150.

答案1

得分: 1

One way to write it could be:

df.explode("items").with_columns(
   coef = pl.col("items").map_dict(coef)
).with_columns(
   pl.col("coef") / pl.col("coef").sum().over("id") * pl.col("value")
)

shape: (6, 4)
┌─────┬──────────┬───────┬───────────────┐
│ id ┆ value ┆ items ┆ coef │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ f64 │
╞═════╪══════════╪═══════╪═══════════════╡
│ 1 ┆ 1e6 ┆ A ┆ 1e6 │
│ 2 ┆ 1.5e6 ┆ A ┆ 545454.545455 │
│ 2 ┆ 1.5e6 ┆ B ┆ 954545.454545 │
│ 3 ┆ 700000.0 ┆ A ┆ 140000.0 │
│ 3 ┆ 700000.0 ┆ B ┆ 245000.0 │
│ 3 ┆ 700000.0 ┆ C ┆ 315000.0 │
└─────┴──────────┴───────┴───────────────┘

However, with Decimals, it raises an exception:

thread '<unnamed>' panicked at 'not implemented for Decimal(None, Some(2))', 
./git/polars/polars/polars-lazy/src/physical_plan/expressions/window.rs:685:9

PanicException: not implemented for Decimal(None, Some(2))

英文:

One way to write it could be:

df.explode(&quot;items&quot;).with_columns(
   coef = pl.col(&quot;items&quot;).map_dict(coef)
).with_columns(
   pl.col(&quot;coef&quot;) / pl.col(&quot;coef&quot;).sum().over(&quot;id&quot;) * pl.col(&quot;value&quot;)
)

shape: (6, 4)
┌─────┬──────────┬───────┬───────────────┐
│ id  ┆ value    ┆ items ┆ coef          │
│ --- ┆ ---      ┆ ---   ┆ ---           │
│ i64 ┆ f64      ┆ str   ┆ f64           │
╞═════╪══════════╪═══════╪═══════════════╡
│ 1   ┆ 1e6      ┆ A     ┆ 1e6           │
│ 2   ┆ 1.5e6    ┆ A     ┆ 545454.545455 │
│ 2   ┆ 1.5e6    ┆ B     ┆ 954545.454545 │
│ 3   ┆ 700000.0 ┆ A     ┆ 140000.0      │
│ 3   ┆ 700000.0 ┆ B     ┆ 245000.0      │
│ 3   ┆ 700000.0 ┆ C     ┆ 315000.0      │
└─────┴──────────┴───────┴───────────────┘

However, with Decimals, it raises an exception:

thread &#39;&lt;unnamed&gt;&#39; panicked at &#39;not implemented for Decimal(None, Some(2))&#39;, 
./git/polars/polars/polars-lazy/src/physical_plan/expressions/window.rs:685:9

PanicException: not implemented for Decimal(None, Some(2))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python-polars：从系数、数值和（嵌套的）列表转换为加权数值

问题

答案1

pandas系列转为JSON内存泄漏

Alphashape和PolygonPatch：基本示例不起作用。为什么？

如何在 countplot 中添加百分比？

获取Azure服务主体的object_id。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。