英文:
Polars: Pad list columns to specific size
问题
我觉得我遇到了XY问题...
以下是我实际想要做的事情:
准确地说,我有一个数据框,如下所示:
形状:(3, 3)
┌───────────┬───────┬──────────────────────────┐
│ nrs ┆ stuff ┆ more_stuff │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ i64 ┆ list[list[i64]] │
╞═══════════╪═══════╪══════════════════════════╡
│ [1, 2, 3] ┆ 1 ┆ [[1, 1], [2, 2], [3, 3]] │
│ [2, 4] ┆ 2 ┆ [[4, 4], [5, 5]] │
│ [1] ┆ 3 ┆ [[6, 6]] │
└───────────┴───────┴──────────────────────────┘
具有普通int64列、list[int64]列和一个list[list[64]]列。我希望能够指定一个大小,并将所有列表(包括嵌套列表)的长度设置为该大小。可以通过缩短到该大小或通过使用它们的最后一个值进行填充(对于普通Python列表,使用list[-1]
)来实现,对于嵌套列表和普通列表都适用。非列表列应保持不变。
因此,对于上述数据框的N=2,结果应为:
形状:(3, 3)
┌───────────┬───────┬──────────────────┐
│ nrs ┆ stuff ┆ more_stuff │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ i64 ┆ list[list[i64]] │
╞═══════════╪═══════╪══════════════════╡
│ [1, 2] ┆ 1 ┆ [[1, 1], [2, 2]] │
│ [2, 4] ┆ 2 ┆ [[4, 4], [5, 5]] │
│ [1, 1] ┆ 3 ┆ [[6, 6], [6, 6]] │
└───────────┴───────┴──────────────────┘
英文:
I think i ran into the XY problem...
Here is what i actually want to do:
To be exact i have a dataframe like:
shape: (3, 3)
┌───────────┬───────┬──────────────────────────┐
│ nrs ┆ stuff ┆ more_stuff │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ i64 ┆ list[list[i64]] │
╞═══════════╪═══════╪══════════════════════════╡
│ [1, 2, 3] ┆ 1 ┆ [[1, 1], [2, 2], [3, 3]] │
│ [2, 4] ┆ 2 ┆ [[4, 4], [5, 5]] │
│ [1] ┆ 3 ┆ [[6, 6]] │
└───────────┴───────┴──────────────────────────┘
With normal int64 columns, list[int64] columns and one list[list[64]] column.
I want to be able to specify a size and set the length of all the list (also the nested) columns to that size. Either by shortening to that size or by padding them with their last value (list[-1]
for normal python lists) for both the nested and the normal lists. The non-list columns should be left unchanged.
So the result for N=2 for the above dataframe should be:
shape: (3, 3)
┌───────────┬───────┬──────────────────┐
│ nrs ┆ stuff ┆ more_stuff │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ i64 ┆ list[list[i64]] │
╞═══════════╪═══════╪══════════════════╡
│ [1, 2] ┆ 1 ┆ [[1, 1], [2, 2]] │
│ [2, 4] ┆ 2 ┆ [[4, 4], [5, 5]] │
│ [1, 1] ┆ 3 ┆ [[6, 6], [6, 6]] │
└───────────┴───────┴──────────────────┘
答案1
得分: 1
回答来自Reddit的/u/commandlineluser:
df = pl.DataFrame({
"nrs": [[1, 2, 3], [2, 4], [1]],
"stuff": [1, 2, 3],
"more_stuff": [[[1, 1], [2, 2], [3, 3]], [[4, 4], [5, 5]], [[6, 6]]]
})
cols = "nrs", "more_stuff"
df.with_columns(
pl.col(cols).arr.take(
pl.arange(0, pl.col(cols).arr.lengths().max()),
null_on_oob=True
).arr.eval(pl.element().forward_fill())
)
shape: (3, 3)
┌───────────┬───────┬──────────────────────────┐
│ nrs ┆ stuff ┆ more_stuff │
│ --- ┆ --- ┆ --- │
│ list[i64] ┆ i64 ┆ list[list[i64]] │
╞═══════════╪═══════╪══════════════════════════╡
│ [1, 2, 3] ┆ 1 ┆ [[1, 1], [2, 2], [3, 3]] │
│ [2, 4, 4] ┆ 2 ┆ [[4, 4], [5, 5], [5, 5]] │
│ [1, 1, 1] ┆ 3 ┆ [[6, 6], [6, 6], [6, 6]] │
└───────────┴───────┴──────────────────────────┘
你可以添加.slice()/.head()
来将它们填充到较小的长度。
编辑:
可能值得注意的是,.forward_fill()
并不专门针对填充的空值,所以如果初始数据中有空值,这可能会成为一个问题。可以处理这个问题,但需要更多的代码。
英文:
Answer via reddit from /u/commandlineluser
df = pl.DataFrame({
"nrs": [[1, 2, 3], [2, 4], [1]],
"stuff": [1, 2, 3],
"more_stuff": [[[1, 1], [2, 2], [3, 3]], [[4, 4], [5, 5]], [[6, 6]]]
})
cols = "nrs", "more_stuff"
df.with_columns(
pl.col(cols).arr.take(
pl.arange(0, pl.col(cols).arr.lengths().max()),
null_on_oob=True
).arr.eval(pl.element().forward_fill())
)
# shape: (3, 3)
# ┌───────────┬───────┬──────────────────────────┐
# │ nrs ┆ stuff ┆ more_stuff │
# │ --- ┆ --- ┆ --- │
# │ list[i64] ┆ i64 ┆ list[list[i64]] │
# ╞═══════════╪═══════╪══════════════════════════╡
# │ [1, 2, 3] ┆ 1 ┆ [[1, 1], [2, 2], [3, 3]] │
# │ [2, 4, 4] ┆ 2 ┆ [[4, 4], [5, 5], [5, 5]] │
# │ [1, 1, 1] ┆ 3 ┆ [[6, 6], [6, 6], [6, 6]] │
# └───────────┴───────┴──────────────────────────┘
You can add a .slice()/.head()
to pad them to a smaller length.
EDIT:
Possibly worth noting that .forward_fill()
doesn't specifically target the padded nulls, so that could be an issue if there were nulls in the initial data. It's possible to handle this, but requires a bit more code.
答案2
得分: 0
查看 extend_constant
。
您可以使用 head
/ tail
进行缩短。
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论