英文:
Fill in the previous value from specific column based on a condition
问题
以下是您提供的代码的翻译部分:
我有一个类似以下的 Polars DataFrame:
```python
d = pl.DataFrame(
{
'val': [1, 2, 3, 4, 5, 6],
'count': [1, 2, 1, 2, 1, 2],
'id': [1, 1, 2, 2, 3, 3],
}
)
我需要创建一个新列 'prev_val',其中包含与 'count' 列的值小一的相同唯一 id 的行中的值,看起来类似于:
r = pl.DataFrame(
{
'val': [9, 7, 9, 11, 2, 5],
'count': [1, 2, 1, 2, 1, 2],
'id': [1, 1, 2, 2, 3, 3],
'prev_val': [None, 9, None, 9, None, 2]
}
)
我无法找到使用原生表达式的方法,所以尝试使用 UDF,尽管 Polars 指南不鼓励使用 UDF:
# 为 UDF 查找所需列的索引
cols = df.columns
search_cols = ['val', 'count', 'id']
col_idx = {col: cols.index(col) for col in search_cols}
def get_previous_value(row):
count = row[col_idx['count']]
id_ = row[col_idx['id']]
# 获取前一个 count,id 保持不变
prev_count = count - 1
# 返回相同 id 的前一个 count 的值
return df.filter(pl.all((pl.col('count') == prev_count), pl.col('id') == id_)).select('val').first()
但当我尝试应用函数时,出现了一个令人困惑的错误:
res = d.apply(lambda x: get_previous_value(x))
Out:
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
left: `2`,
right: `1`: impl error', /home/runner/work/polars/polars/polars/polars-core/src/series/iterator.rs:70:9
我做错了什么?在我看来,断言正在检查返回值是否是一个 DataFrame,我不明白为什么应该是这样的...
也许有没有一种本地方法来执行这个操作?
请注意,由于代码中包含特殊字符(例如'& # 39;'),因此翻译的结果可能会略有不同。如果您有其他问题或需要进一步的帮助,请随时告诉我。
<details>
<summary>英文:</summary>
I have a Polars DataFrame that looks something like so:
```python
d = pl.DataFrame(
{
'val': [1, 2, 3, 4, 5, 6],
'count': [1, 2, 1, 2, 1, 2],
'id': [1, 1, 2, 2, 3, 3],
}
)
What I need is to create a new column 'prev_val'
which will contain values for the same unique id taken from a row where the value in 'count'
column is smaller by one, i.e. looking something like:
r = pl.DataFrame(
{
'val': [9, 7, 9, 11, 2, 5],
'count': [1, 2, 1, 2, 1, 2],
'id': [1, 1, 2, 2, 3, 3],
'prev_val': [None, 9, None, 9, None, 2]
}
)
I couldn't figure a way of using native expressions so I tried doing this using a UDF, even though Polars guide discourages the use of UDFs:
# for a UDF find indices for necessary columns
cols = df.columns
search_cols = ['val', 'count', 'id']
col_idx = {col: cols.index(col) for col in search_cols}
def get_previous_value(row):
count = row[col_idx['count']]
id_ = row[col_idx['id']]
# get the previous count, id remains the same
prev_count = count - 1
# return the value for the previous count for the same id
return df.filter(pl.all((pl.col('count') == prev_count), pl.col('id') == id_)).select('val').first()
But when I try and apply a function I get a confusing error:
res = d.apply(lambda x: get_previous_value(x))
Out:
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
left: `2`,
right: `1`: impl error', /home/runner/work/polars/polars/polars/polars-core/src/series/iterator.rs:70:9
---------------------------------------------------------------------------
PanicException Traceback (most recent call last)
Cell In[59], line 2
----> 2 res = d.apply(lambda x: get_previous_value(x))
File ~/python/f1_car_following/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py:5388, in DataFrame.apply(self, function, return_dtype, inference_size)
5302 def apply(
5303 self,
5304 function: Callable[[tuple[Any, ...]], Any],
(...)
5307 inference_size: int = 256,
5308 ) -> Self:
5309 """
5310 Apply a custom/user-defined function (UDF) over the rows of the DataFrame.
5311
(...)
5386
5387 """
-> 5388 out, is_df = self._df.apply(function, return_dtype, inference_size)
5389 if is_df:
5390 return self._from_pydf(out)
PanicException: assertion failed: `(left == right)`
left: `2`,
right: `1`: impl error
What am I doing wrong here? It looks to me that the assertion is checking whether the return value is a dataframe and I don't uderstand why it should be so...
Is there maybe a native way to do this?
答案1
得分: 1
这段代码中的一些关键部分翻译如下:
-
"It looks like a
.join
." 可以翻译为 "看起来像是一个.join
。" -
"You could use
.unique
withkeep="last"
to generate your search space." 可以翻译为 "你可以使用.unique
和keep="last"
来生成你的搜索空间。" -
"shape: (6, 4)" 可以翻译为 "形状:(6, 4)"
-
表格中的列名翻译为中文,如 "val" 可以翻译为 "数值","count" 可以翻译为 "计数","id" 可以翻译为 "标识","prev_val" 可以翻译为 "上一个数值"。
-
表格中的数据根据实际内容进行翻译,例如 "9 ┆ 1 ┆ 1 ┆ null" 可以翻译为 "9 ┆ 1 ┆ 1 ┆ 空"。
-
"df used" 可以翻译为 "使用的数据框"。
请注意,上述翻译是根据提供的内容进行的,如需其他部分的翻译或有特殊要求,请告诉我。
英文:
It looks like a .join
.
You could use .unique
with keep="last"
to generate your search space.
(df.with_columns(pl.col("count") + 1)
.unique(
subset=["id", "count"],
keep="last",
maintain_order=True
))
df.join(
df.with_columns(pl.col("count") + 1)
.unique(subset=["id", "count"], keep="last", maintain_order=True)
.select("id", "count", prev_val = "val"),
on=["id", "count"],
how="left"
)
shape: (6, 4)
┌─────┬───────┬─────┬──────────┐
│ val ┆ count ┆ id ┆ prev_val │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═══════╪═════╪══════════╡
│ 9 ┆ 1 ┆ 1 ┆ null │
│ 7 ┆ 2 ┆ 1 ┆ 9 │
│ 9 ┆ 1 ┆ 2 ┆ null │
│ 11 ┆ 2 ┆ 2 ┆ 9 │
│ 2 ┆ 1 ┆ 3 ┆ null │
│ 5 ┆ 2 ┆ 3 ┆ 2 │
└─────┴───────┴─────┴──────────┘
df used:
df = pl.from_repr("""
shape: (6, 3)
┌─────┬───────┬─────┐
│ val ┆ count ┆ id │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═══════╪═════╡
│ 9 ┆ 1 ┆ 1 │
│ 7 ┆ 2 ┆ 1 │
│ 9 ┆ 1 ┆ 2 │
│ 11 ┆ 2 ┆ 2 │
│ 2 ┆ 1 ┆ 3 │
│ 5 ┆ 2 ┆ 3 │
└─────┴───────┴─────┘
""")
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论