创建基于列数值的行的副本

huangapple go评论52阅读模式
英文:

Create duplicates of row based column values

问题

以下是代码部分的翻译:

I'm trying to build a histogram of some data in polars. As part of my histogram code, I need to duplicate some rows. I've got a column of values, where each row also has a weight that says how many times the row should be added to the histogram.

How can I duplicate my value rows according to the weight column?

Here is some example data, with a target series:

import polars as pl

df = pl.DataFrame({"value":[1,2,3], "weight":[2, 2, 1]})

print(df)
# shape: (3, 2)
# ┌───────┬────────┐
# │ value ┆ weight │
# │ ---   ┆ ---    │
# │ i64   ┆ i64    │
# ╞═══════╪════════╡
# │ 1     ┆ 2      │
# │ 2     ┆ 2      │
# │ 3     ┆ 1      │
# └───────┴────────┘

s_target = pl.Series(name="value", values=[1,1,2,2,3])
print(s_target)
# shape: (5,)
# Series: 'value' [i64]
# [
# 	1
# 	1
# 	2
# 	2
# 	3
# ]
英文:

I'm trying to build a histogram of some data in polars. As part of my histogram code, I need to duplicate some rows. I've got a column of values, where each row also has a weight that says how many times the row should be added to the histogram.

How can I duplicate my value rows according to the weight column?

Here is some example data, with a target series:

import polars as pl

df = pl.DataFrame({"value":[1,2,3], "weight":[2, 2, 1]})

print(df)
# shape: (3, 2)
# ┌───────┬────────┐
# │ value ┆ weight │
# │ ---   ┆ ---    │
# │ i64   ┆ i64    │
# ╞═══════╪════════╡
# │ 1     ┆ 2      │
# │ 2     ┆ 2      │
# │ 3     ┆ 1      │
# └───────┴────────┘

s_target = pl.Series(name="value", values=[1,1,2,2,3])
print(s_target)
# shape: (5,)
# Series: 'value' [i64]
# [
# 	1
# 	1
# 	2
# 	2
# 	3
# ]

答案1

得分: 4

以下是您要翻译的内容:

如何
(
    df.with_columns(
        pl.col("value").repeat_by(pl.col("weight"))
    )
    .select(pl.col("value").arr.explode())
)
在 [11]: df.with_columns(pl.col('value').repeat_by(pl.col('weight'))).select(pl.col('value').arr.explode())
出 [11]:
形状: (5, 1)
┌───────┐
 value 
 ---   
 i64   
╞═══════╡
 1     
 1     
 2     
 2     
 3     
└───────┘

我不知道你可以这么容易地做到这一点,我只是在写答案时才了解到。Polars 真是太好用了 创建基于列数值的行的副本

英文:

How about

(
    df.with_columns(
        pl.col("value").repeat_by(pl.col("weight"))
    )
    .select(pl.col("value").arr.explode())
)
In [11]: df.with_columns(pl.col('value').repeat_by(pl.col('weight'))).select(pl.col('value').arr.explode())
Out[11]:
shape: (5, 1)
┌───────┐
 value 
 ---   
 i64   
╞═══════╡
 1     
 1     
 2     
 2     
 3     
└───────┘

I didn't know you could do this so easily, I only learned about it while writing the answer. Polars is so nice 创建基于列数值的行的副本

答案2

得分: 2

以下是翻译好的内容:

"Turns out repeat_by and a subsequent explode are the perfect building blocks for this transformation:

>>> df.select(pl.col('value').repeat_by('weight').arr.explode()) 
shape: (5, 1)
┌───────┐
 value 
 ---   
 i64   
╞═══════╡
 1     
 1     
 2     
 2     
 3     
└───────┘
```"

<details>
<summary>英文:</summary>

Turns out [`repeat_by`](https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.repeat_by.html) and a subsequent [`explode`](https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.arr.explode.html#polars.Expr.arr.explode) are the perfect building blocks for this transformation:

```python
&gt;&gt;&gt; df.select(pl.col(&#39;value&#39;).repeat_by(&#39;weight&#39;).arr.explode()) 
shape: (5, 1)
┌───────┐
 value 
 ---   
 i64   
╞═══════╡
 1     
 1     
 2     
 2     
 3     
└───────┘

huangapple
  • 本文由 发表于 2023年2月14日 22:06:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/75448985.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定