2023年8月10日 20:31:35go评论106阅读模式

英文:

Filter on `list(Int64)` dtype in polars

问题

以下是翻译好的部分：

"Say I have" -> "假设我有"
"I'd like to keep rows where 'a' equals [1,2,3]" -> "我想保留'a'等于[1,2,3]的行"
"I've tried" -> "我尝试过"
"but it raises" -> "但它引发了"

英文:

Say I have

In [20]: df = pl.DataFrame({&#39;a&#39;: [[1,2,3], [1,4,2], [1,3,3]], &#39;b&#39;: [4,2,1]})

In [21]: df
Out[21]:
shape: (3, 2)
┌───────────┬─────┐
│ a         ┆ b   │
│ ---       ┆ --- │
│ list[i64] ┆ i64 │
╞═══════════╪═════╡
│ [1, 2, 3] ┆ 4   │
│ [1, 4, 2] ┆ 2   │
│ [1, 3, 3] ┆ 1   │
└───────────┴─────┘

I'd like to keep rows where 'a' equals [1,2,3]

I've tried

In [23]: df.filter(pl.col(&#39;a&#39;)==[1,2,3])

ArrowErrorException: NotYetImplemented(&quot;Casting from Int64 to LargeList(Field { name: \&quot;item\&quot;, data_type: Int64, is_nullable: true, metadata: {} }) not supported&quot;)

but it raises

答案1

得分: 2

这个函数似乎尚未实现错误。

但是，你可以添加自己的筛选函数，就像这样（将你自己的列作为筛选条件进行累积）：

from functools import reduce
def filterList(c: pl.col, l: list) -> pl.col:
    return reduce(lambda a, b: a & b, [c.list.get(idx) == item for idx, item in enumerate(l)])

或者如果你更喜欢通常的循环方式：

def filterList(c: pl.col, l: list) -> pl.col:
    res = pl.lit(True)
    for idx, item in enumerate(l):
        res = res & (c.list.get(idx) == item)
    return res

然后只需调用：

df.filter(filterList(pl.col('a'), [1, 2, 3]))

即使原始数据框中的列表条目较短（因为.get(idx)只会返回null），这也应该为你提供正确的结果。

英文:

By the error this function doesn't seem to be implemented yet

However you could add your own filter function - like this (accumulating your own column as a filter):

from functools import reduce
def filterList(c: pl.col, l: list) -&gt; pl.col:
    return reduce(lambda a,b: a &amp; b, [c.list.get(idx) == item for idx, item in enumerate(l)])

or if you prefer the usual for loop-style:

def filterList(c: pl.col, l: list) -&gt; pl.col:
    res = pl.lit(True)
    for idx, item in enumerate(l):
        res = res &amp; (c.list.get(idx) == item)
    return res

and then simply call

df.filter(filterList(pl.col(&#39;a&#39;), [1,2,3]))

which should give you the right result even if the list entries in the original dataframe are shorter (because .get(idx)would simply return null)

答案2

得分: 2

你可以先对列表进行哈希，然后对文字进行哈希，然后比较这两者：

df.filter(pl.col('a').hash() == pl.lit([[1,2,3]]).hash())

英文:

You can hash the list first and hash a literal and then compare the two:

df.filter(pl.col(&#39;a&#39;).hash() == pl.lit([[1,2,3]]).hash())

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Polars中筛选`list(Int64)`数据类型

问题

答案1

答案2

使用Python进行Microsoft Graph身份验证

是不是可以让Python中的嵌套循环异步运行？

如何安全地检索已添加用户的密码？

如何在Django视图中定义表单的Select字段的选项时使用它们？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论