2023年4月10日 19:56:09go评论54阅读模式

英文:

Fill in the previous value from specific column based on a condition

问题

以下是您提供的代码的翻译部分：

我有一个类似以下的 Polars DataFrame：
```python
d = pl.DataFrame(
    {
        'val': [1, 2, 3, 4, 5, 6],
        'count': [1, 2, 1, 2, 1, 2],
        'id': [1, 1, 2, 2, 3, 3],
    }
)

我需要创建一个新列 'prev_val'，其中包含与 'count' 列的值小一的相同唯一 id 的行中的值，看起来类似于：

r = pl.DataFrame(
    {
        'val': [9, 7, 9, 11, 2, 5],
        'count': [1, 2, 1, 2, 1, 2],
        'id': [1, 1, 2, 2, 3, 3],
        'prev_val': [None, 9, None, 9, None, 2]
    }
)

我无法找到使用原生表达式的方法，所以尝试使用 UDF，尽管 Polars 指南不鼓励使用 UDF：

# 为 UDF 查找所需列的索引
cols = df.columns
search_cols = ['val', 'count', 'id']
col_idx = {col: cols.index(col) for col in search_cols}

def get_previous_value(row):
    count = row[col_idx['count']]
    id_ = row[col_idx['id']]
    
    # 获取前一个 count，id 保持不变
    prev_count = count - 1
    
    # 返回相同 id 的前一个 count 的值
    return df.filter(pl.all((pl.col('count') == prev_count), pl.col('id') == id_)).select('val').first()

但当我尝试应用函数时，出现了一个令人困惑的错误：

res = d.apply(lambda x: get_previous_value(x))

Out:
thread '&lt;unnamed&gt;' panicked at 'assertion failed: `(left == right)`
  left: `2`,
 right: `1`: impl error', /home/runner/work/polars/polars/polars/polars-core/src/series/iterator.rs:70:9

我做错了什么？在我看来，断言正在检查返回值是否是一个 DataFrame，我不明白为什么应该是这样的...

也许有没有一种本地方法来执行这个操作？


请注意，由于代码中包含特殊字符（例如'& # 39;'），因此翻译的结果可能会略有不同。如果您有其他问题或需要进一步的帮助，请随时告诉我。

<details>
<summary>英文:</summary>

I have a Polars DataFrame that looks something like so:
```python
d = pl.DataFrame(
    {
        &#39;val&#39;: [1, 2, 3, 4, 5, 6],
        &#39;count&#39;: [1, 2, 1, 2, 1, 2],
        &#39;id&#39;: [1, 1, 2, 2, 3, 3],
    }
)

What I need is to create a new column 'prev_val' which will contain values for the same unique id taken from a row where the value in 'count' column is smaller by one, i.e. looking something like:

r = pl.DataFrame(
    {
        &#39;val&#39;: [9, 7, 9, 11, 2, 5],
        &#39;count&#39;: [1, 2, 1, 2, 1, 2],
        &#39;id&#39;: [1, 1, 2, 2, 3, 3],
        &#39;prev_val&#39;: [None, 9, None, 9, None, 2]
    }
)

I couldn't figure a way of using native expressions so I tried doing this using a UDF, even though Polars guide discourages the use of UDFs:

# for a UDF find indices for necessary columns
cols = df.columns
search_cols = [&#39;val&#39;, &#39;count&#39;, &#39;id&#39;]
col_idx = {col: cols.index(col) for col in search_cols}

def get_previous_value(row):
    count = row[col_idx[&#39;count&#39;]]
    id_ = row[col_idx[&#39;id&#39;]]
    
    # get the previous count, id remains the same
    prev_count = count - 1
    
    # return the value for the previous count for the same id
    return df.filter(pl.all((pl.col(&#39;count&#39;) == prev_count), pl.col(&#39;id&#39;) == id_)).select(&#39;val&#39;).first()

But when I try and apply a function I get a confusing error:

res = d.apply(lambda x: get_previous_value(x))

Out:
thread &#39;&lt;unnamed&gt;&#39; panicked at &#39;assertion failed: `(left == right)`
  left: `2`,
 right: `1`: impl error&#39;, /home/runner/work/polars/polars/polars/polars-core/src/series/iterator.rs:70:9
---------------------------------------------------------------------------
PanicException                            Traceback (most recent call last)
Cell In[59], line 2
----&gt; 2 res = d.apply(lambda x: get_previous_value(x))

File ~/python/f1_car_following/.venv/lib/python3.10/site-packages/polars/dataframe/frame.py:5388, in DataFrame.apply(self, function, return_dtype, inference_size)
   5302 def apply(
   5303     self,
   5304     function: Callable[[tuple[Any, ...]], Any],
   (...)
   5307     inference_size: int = 256,
   5308 ) -&gt; Self:
   5309     &quot;&quot;&quot;
   5310     Apply a custom/user-defined function (UDF) over the rows of the DataFrame.
   5311 
   (...)
   5386 
   5387     &quot;&quot;&quot;
-&gt; 5388     out, is_df = self._df.apply(function, return_dtype, inference_size)
   5389     if is_df:
   5390         return self._from_pydf(out)

PanicException: assertion failed: `(left == right)`
  left: `2`,
 right: `1`: impl error

What am I doing wrong here? It looks to me that the assertion is checking whether the return value is a dataframe and I don't uderstand why it should be so...

Is there maybe a native way to do this?

答案1

得分: 1

这段代码中的一些关键部分翻译如下：

"It looks like a .join." 可以翻译为 "看起来像是一个 .join。"
"You could use .unique with keep="last" to generate your search space." 可以翻译为 "你可以使用 .unique 和 keep="last" 来生成你的搜索空间。"
"shape: (6, 4)" 可以翻译为 "形状：(6, 4)"
表格中的列名翻译为中文，如 "val" 可以翻译为 "数值"，"count" 可以翻译为 "计数"，"id" 可以翻译为 "标识"，"prev_val" 可以翻译为 "上一个数值"。
表格中的数据根据实际内容进行翻译，例如 "9 ┆ 1 ┆ 1 ┆ null" 可以翻译为 "9 ┆ 1 ┆ 1 ┆ 空"。
"df used" 可以翻译为 "使用的数据框"。

请注意，上述翻译是根据提供的内容进行的，如需其他部分的翻译或有特殊要求，请告诉我。

英文:

It looks like a .join.

You could use .unique with keep="last" to generate your search space.

(df.with_columns(pl.col(&quot;count&quot;) + 1)
   .unique(
      subset=[&quot;id&quot;, &quot;count&quot;], 
      keep=&quot;last&quot;,
      maintain_order=True
))

df.join(
   df.with_columns(pl.col(&quot;count&quot;) + 1)
     .unique(subset=[&quot;id&quot;, &quot;count&quot;], keep=&quot;last&quot;, maintain_order=True)
     .select(&quot;id&quot;, &quot;count&quot;, prev_val = &quot;val&quot;),
   on=[&quot;id&quot;, &quot;count&quot;],
   how=&quot;left&quot;
)

shape: (6, 4)
┌─────┬───────┬─────┬──────────┐
│ val ┆ count ┆ id  ┆ prev_val │
│ --- ┆ ---   ┆ --- ┆ ---      │
│ i64 ┆ i64   ┆ i64 ┆ i64      │
╞═════╪═══════╪═════╪══════════╡
│ 9   ┆ 1     ┆ 1   ┆ null     │
│ 7   ┆ 2     ┆ 1   ┆ 9        │
│ 9   ┆ 1     ┆ 2   ┆ null     │
│ 11  ┆ 2     ┆ 2   ┆ 9        │
│ 2   ┆ 1     ┆ 3   ┆ null     │
│ 5   ┆ 2     ┆ 3   ┆ 2        │
└─────┴───────┴─────┴──────────┘

df used:

df = pl.from_repr(&quot;&quot;&quot;
shape: (6, 3)
┌─────┬───────┬─────┐
│ val ┆ count ┆ id  │
│ --- ┆ ---   ┆ --- │
│ i64 ┆ i64   ┆ i64 │
╞═════╪═══════╪═════╡
│ 9   ┆ 1     ┆ 1   │
│ 7   ┆ 2     ┆ 1   │
│ 9   ┆ 1     ┆ 2   │
│ 11  ┆ 2     ┆ 2   │
│ 2   ┆ 1     ┆ 3   │
│ 5   ┆ 2     ┆ 3   │
└─────┴───────┴─────┘
&quot;&quot;&quot;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从特定列中填写先前的数值基于一个条件。

问题

答案1

在Go语言中如何将C和Python代码串联起来？

How to improve readabilty and maintainability of @patch and MagicMock statements (avoid long names and String identification)?

Python – 将JSON列表转换为数据框

FastAPI Depends 使用 get_db 在 PUT 端点失败。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论