英文:
How do I fill in missing factors in a polars dataframe?
问题
我有这个数据框:
testdf = pl.DataFrame({'date':['date1','date1','date1','date2','date3','date3'], 'factor':['A','B','C','B','B','C'], 'val':[1,2,3,3,1,5]})
英文:
I have this dataframe:
testdf = pl.DataFrame({'date':['date1','date1','date1','date2','date3','date3'], 'factor':['A','B','C','B','B','C'], 'val':[1,2,3,3,1,5]})
Some of the factors are missing. I'd like to fill in the gaps with values 0. This is what it looks like.
答案1
得分: 1
这是我目前的代码(在下面的评论的帮助下):
(
testdf
.select(pl.col(['date','factor']).unique().implode())
.explode('date')
.explode('factor')
.join(testdf, how='left', on=['date','factor'])
.fill_null(0)
)
英文:
This is what I have so far (with help from comment below):
(
testdf
.select(pl.col(['date','factor']).unique().implode())
.explode('date')
.explode('factor')
.join(testdf, how='left', on=['date','factor'])
.fill_null(0)
)
答案2
得分: 1
以下是翻译好的部分:
为了更好的可读性和"极性",我认为下面的代码很不错,因为它很清晰地展示了你要做的事情,将数据框看起来像一个常规的"melted"数据框。不过,我还没有对其进行基准测试。
希望这对你有所帮助。
英文:
For pure readability/"polars"icity, I think
testdf.pivot(values="val", index="date", columns="factor", aggregate_function="first").melt(id_vars="date", variable_name="factor",value_name="value")
┌───────┬────────┬───────┐
│ date ┆ factor ┆ value │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 │
╞═══════╪════════╪═══════╡
│ date1 ┆ A ┆ 1 │
│ date2 ┆ A ┆ null │
│ date3 ┆ A ┆ null │
│ date1 ┆ B ┆ 2 │
│ date2 ┆ B ┆ 3 │
│ date3 ┆ B ┆ 1 │
│ date1 ┆ C ┆ 3 │
│ date2 ┆ C ┆ null │
│ date3 ┆ C ┆ 5 │
└───────┴────────┴───────┘
is good, since it makes most clear what you are trying to do, make the dataframe look like a usual "melted" one. I haven't benchmarked it though.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论