英文:
Filter on `list(Int64)` dtype in polars
问题
以下是翻译好的部分:
"Say I have" -> "假设我有"
"I'd like to keep rows where 'a' equals [1,2,3]" -> "我想保留'a'等于[1,2,3]的行"
"I've tried" -> "我尝试过"
"but it raises" -> "但它引发了"
英文:
Say I have
In [20]: df = pl.DataFrame({'a': [[1,2,3], [1,4,2], [1,3,3]], 'b': [4,2,1]})
In [21]: df
Out[21]:
shape: (3, 2)
┌───────────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ list[i64] ┆ i64 │
╞═══════════╪═════╡
│ [1, 2, 3] ┆ 4 │
│ [1, 4, 2] ┆ 2 │
│ [1, 3, 3] ┆ 1 │
└───────────┴─────┘
I'd like to keep rows where 'a'
equals [1,2,3]
I've tried
In [23]: df.filter(pl.col('a')==[1,2,3])
ArrowErrorException: NotYetImplemented("Casting from Int64 to LargeList(Field { name: \"item\", data_type: Int64, is_nullable: true, metadata: {} }) not supported")
but it raises
答案1
得分: 2
这个函数似乎尚未实现错误。
但是,你可以添加自己的筛选函数,就像这样(将你自己的列作为筛选条件进行累积):
from functools import reduce
def filterList(c: pl.col, l: list) -> pl.col:
return reduce(lambda a, b: a & b, [c.list.get(idx) == item for idx, item in enumerate(l)])
或者如果你更喜欢通常的循环方式:
def filterList(c: pl.col, l: list) -> pl.col:
res = pl.lit(True)
for idx, item in enumerate(l):
res = res & (c.list.get(idx) == item)
return res
然后只需调用:
df.filter(filterList(pl.col('a'), [1, 2, 3]))
即使原始数据框中的列表条目较短(因为.get(idx)
只会返回null
),这也应该为你提供正确的结果。
英文:
By the error this function doesn't seem to be implemented yet
However you could add your own filter function - like this (accumulating your own column as a filter):
from functools import reduce
def filterList(c: pl.col, l: list) -> pl.col:
return reduce(lambda a,b: a & b, [c.list.get(idx) == item for idx, item in enumerate(l)])
or if you prefer the usual for loop-style:
def filterList(c: pl.col, l: list) -> pl.col:
res = pl.lit(True)
for idx, item in enumerate(l):
res = res & (c.list.get(idx) == item)
return res
and then simply call
df.filter(filterList(pl.col('a'), [1,2,3]))
which should give you the right result even if the list entries in the original dataframe are shorter (because .get(idx)
would simply return null
)
答案2
得分: 2
你可以先对列表进行哈希,然后对文字进行哈希,然后比较这两者:
df.filter(pl.col('a').hash() == pl.lit([[1,2,3]]).hash())
英文:
You can hash the list first and hash a literal and then compare the two:
df.filter(pl.col('a').hash() == pl.lit([[1,2,3]]).hash())
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论