从特定列中填写先前的数值基于一个条件。

huangapple go评论54阅读模式
英文:

Fill in the previous value from specific column based on a condition

问题

以下是您提供的代码的翻译部分:

我有一个类似以下的 Polars DataFrame
```python
d = pl.DataFrame(
    {
        'val': [1, 2, 3, 4, 5, 6],
        'count': [1, 2, 1, 2, 1, 2],
        'id': [1, 1, 2, 2, 3, 3],
    }
)

我需要创建一个新列 'prev_val',其中包含与 'count' 列的值小一的相同唯一 id 的行中的值,看起来类似于:

r = pl.DataFrame(
    {
        'val': [9, 7, 9, 11, 2, 5],
        'count': [1, 2, 1, 2, 1, 2],
        'id': [1, 1, 2, 2, 3, 3],
        'prev_val': [None, 9, None, 9, None, 2]
    }
)

我无法找到使用原生表达式的方法,所以尝试使用 UDF,尽管 Polars 指南不鼓励使用 UDF:

# 为 UDF 查找所需列的索引
cols = df.columns
search_cols = ['val', 'count', 'id']
col_idx = {col: cols.index(col) for col in search_cols}

def get_previous_value(row):
    count = row[col_idx['count']]
    id_ = row[col_idx['id']]
    
    # 获取前一个 count,id 保持不变
    prev_count = count - 1
    
    # 返回相同 id 的前一个 count 的值
    return df.filter(pl.all((pl.col('count') == prev_count), pl.col('id') == id_)).select('val').first()

但当我尝试应用函数时,出现了一个令人困惑的错误:

res = d.apply(lambda x: get_previous_value(x))

Out:
thread '<unnamed>' panicked at 'assertion failed: `(left == right)`
  left: `2`,
 right: `1`: impl error', /home/runner/work/polars/polars/polars/polars-core/src/series/iterator.rs:70:9

我做错了什么?在我看来,断言正在检查返回值是否是一个 DataFrame,我不明白为什么应该是这样的...

也许有没有一种本地方法来执行这个操作?


请注意,由于代码中包含特殊字符(例如'& # 39;'),因此翻译的结果可能会略有不同。如果您有其他问题或需要进一步的帮助,请随时告诉我。

<details>
<summary>英文:</summary>

I have a Polars DataFrame that looks something like so:
```python
d = pl.DataFrame(
    {
        &#39;val&#39;: [1, 2, 3, 4, 5, 6],
        &#39;count&#39;: [1, 2, 1, 2, 1, 2],
        &#39;id&#39;: [1, 1, 2, 2, 3, 3],
    }
)

What I need is to create a new column &#39;prev_val&#39; which will contain values for the same unique id taken from a row where the value in &#39;count&#39; column is smaller by one, i.e. looking something like:

r = pl.DataFrame(
    {
        &#39;val&#39;: [9, 7, 9, 11, 2, 5],
        &#39;count&#39;: [1, 2, 1, 2, 1, 2],
        &#39;id&#39;: [1, 1, 2, 2, 3, 3],
        &#39;prev_val&#39;: [None, 9, None, 9, None, 2]
    }
)

I couldn't figure a way of using native expressions so I tried doing this using a UDF, even though Polars guide discourages the use of UDFs:

# for a UDF find indices for necessary columns
cols = df.columns
search_cols = [&#39;val&#39;, &#39;count&#39;, &#39;id&#39;]
col_idx = {col: cols.index(col) for col in search_cols}

def get_previous_value(row):
    count = row[col_idx[&#39;count&#39;]]
    id_ = row[col_idx[&#39;id&#39;]]
    
    # get the previous count, id remains the same
    prev_count = count - 1
    
    # return the value for the previous count for the same id
    return df.filter(pl.all((pl.col(&#39;count&#39;) == prev_count), pl.col(&#39;id&#39;) == id_)).select(&#39;val&#39;).first()

But when I try and apply a function I get a confusing error:

res = d.apply(lambda x: get_previous_value(x))

Out:
thread &#39;&lt;unnamed&gt;&#39; panicked at &#39;assertion failed: `(left == right)`
  left: `2`,
 right: `1`: impl error&#39;, /home/runner/work/polars/polars/polars/polars-core/src/series/iterator.rs:70:9
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[59], line 2
----&gt; 2 res = d.apply(lambda x: get_previous_value(x))

File ~/python/f1_car_following/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py:5388, in DataFrame.apply(self, function, return_dtype, inference_size)
   5302 def apply(
   5303     self,
   5304     function: Callable[[tuple[Any, ...]], Any],
   (...)
   5307     inference_size: int = 256,
   5308 ) -&gt; Self:
   5309     &quot;&quot;&quot;
   5310     Apply a custom/user-defined function (UDF) over the rows of the DataFrame.
   5311 
   (...)
   5386 
   5387     &quot;&quot;&quot;
-&gt; 5388     out, is_df = self._df.apply(function, return_dtype, inference_size)
   5389     if is_df:
   5390         return self._from_pydf(out)

PanicException: assertion failed: `(left == right)`
  left: `2`,
 right: `1`: impl error

What am I doing wrong here? It looks to me that the assertion is checking whether the return value is a dataframe and I don't uderstand why it should be so...

Is there maybe a native way to do this?

答案1

得分: 1

这段代码中的一些关键部分翻译如下:

  1. "It looks like a .join." 可以翻译为 "看起来像是一个 .join。"

  2. "You could use .unique with keep=&quot;last&quot; to generate your search space." 可以翻译为 "你可以使用 .uniquekeep=&quot;last&quot; 来生成你的搜索空间。"

  3. "shape: (6, 4)" 可以翻译为 "形状:(6, 4)"

  4. 表格中的列名翻译为中文,如 "val" 可以翻译为 "数值","count" 可以翻译为 "计数","id" 可以翻译为 "标识","prev_val" 可以翻译为 "上一个数值"。

  5. 表格中的数据根据实际内容进行翻译,例如 "9 ┆ 1 ┆ 1 ┆ null" 可以翻译为 "9 ┆ 1 ┆ 1 ┆ 空"。

  6. "df used" 可以翻译为 "使用的数据框"。

请注意,上述翻译是根据提供的内容进行的,如需其他部分的翻译或有特殊要求,请告诉我。

英文:

It looks like a .join.

You could use .unique with keep=&quot;last&quot; to generate your search space.

(df.with_columns(pl.col(&quot;count&quot;) + 1)
   .unique(
      subset=[&quot;id&quot;, &quot;count&quot;], 
      keep=&quot;last&quot;,
      maintain_order=True
))
df.join(
   df.with_columns(pl.col(&quot;count&quot;) + 1)
     .unique(subset=[&quot;id&quot;, &quot;count&quot;], keep=&quot;last&quot;, maintain_order=True)
     .select(&quot;id&quot;, &quot;count&quot;, prev_val = &quot;val&quot;),
   on=[&quot;id&quot;, &quot;count&quot;],
   how=&quot;left&quot;
)
shape: (6, 4)
┌─────┬───────┬─────┬──────────┐
│ val ┆ count ┆ id  ┆ prev_val │
│ --- ┆ ---   ┆ --- ┆ ---      │
│ i64 ┆ i64   ┆ i64 ┆ i64      │
╞═════╪═══════╪═════╪══════════╡
│ 9   ┆ 1     ┆ 1   ┆ null     │
│ 7   ┆ 2     ┆ 1   ┆ 9        │
│ 9   ┆ 1     ┆ 2   ┆ null     │
│ 11  ┆ 2     ┆ 2   ┆ 9        │
│ 2   ┆ 1     ┆ 3   ┆ null     │
│ 5   ┆ 2     ┆ 3   ┆ 2        │
└─────┴───────┴─────┴──────────┘

df used:

df = pl.from_repr(&quot;&quot;&quot;
shape: (6, 3)
┌─────┬───────┬─────┐
│ val ┆ count ┆ id  │
│ --- ┆ ---   ┆ --- │
│ i64 ┆ i64   ┆ i64 │
╞═════╪═══════╪═════╡
│ 9   ┆ 1     ┆ 1   │
│ 7   ┆ 2     ┆ 1   │
│ 9   ┆ 1     ┆ 2   │
│ 11  ┆ 2     ┆ 2   │
│ 2   ┆ 1     ┆ 3   │
│ 5   ┆ 2     ┆ 3   │
└─────┴───────┴─────┘
&quot;&quot;&quot;)

huangapple
  • 本文由 发表于 2023年4月10日 19:56:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/75976883.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定