英文:
Find matching pairs and lay them out as columns in polars
问题
假设你已经导入了polars
和numpy
库,你可以按照以下步骤来实现你的需求:
- 首先,使用
polars.DataFrame
创建一个DataFrame对象df
,其中包含列j
和k
的数据。
import polars as pl
import numpy as np
df = pl.DataFrame({
'j': np.random.randint(10, 99, 9),
'k': np.tile([1, 2, 2], 3)
})
- 接下来,使用
groupby
函数按照列k
进行分组,并使用sort
函数按照列k
和j
进行排序。
grouped = df.groupby('k').sort(by=['k', 'j'])
- 然后,使用
shift
函数创建一个新的列k_shifted
,其中存储了列k
向下偏移一行的值。
grouped = grouped.with_column(pl.col('k').shift(-1).alias('k_shifted'))
- 使用
filter
函数筛选出k=1
且k_shifted=2
的行。
filtered = grouped.filter((pl.col('k') == 1) & (pl.col('k_shifted') == 2))
- 最后,使用
select
函数选择列j
和k_shifted
,并将结果存储在一个新的DataFrame对象result
中。
result = filtered.select(['j', 'k_shifted'])
现在,result
中包含了满足条件的j
和对应的最后一个k=2
的值。
英文:
Say I have this:
df = polars.DataFrame(dict(
j=numpy.random.randint(10, 99, 9),
k=numpy.tile([1, 2, 2], 3),
))
j (i64) k (i64)
47 1
22 2
82 2
19 1
85 2
15 2
89 1
74 2
26 2
shape: (9, 2)
where column k
is kind of a marker - 1
starts and then there are one or more 2
s (in the above example always two for simplicity, but in practice one or more). I'd like to get values in j
that correspond to k=1
and the last corresponding k=2
. For the above:
j (i64) k (i64)
47 1 >-\
22 2 | these are the 1 and the last of its matching 2s
82 2 <-/
19 1 >-\
85 2 | these are the 1 and the last of its matching 2s
15 2 <-/
89 1 >-\
74 2 | these are the 1 and the last of its matching 2s
26 2 <-/
shape: (9, 2)
and I'd like to put these in two columns, so I get this:
j (i64) k (i64)
47 82
19 15
89 26
shape: (9, 2)
How would I approach this in polars?
答案1
得分: 2
你可以通过查找k=1
或者下一个k
(例如shift
)为1
来进行简单的filter
:
df.select(
j=pl.col('j').filter(pl.col('k') == 1),
k=pl.col('j').filter(pl.col('k').shift(-1).fill_null(1) == 1),
)
shape: (3, 2)
┌─────┬─────┐
│ j ┆ k │
│ --- ┆ --- │
│ i32 ┆ i32 │
╞═════╪═════╡
│ 47 ┆ 82 │
│ 19 ┆ 15 │
│ 89 ┆ 26 │
└─────┴─────┘
英文:
You can filter
simply by looking for k=1
or when the next k
, e.g. a shift
, is 1
:
df.select(
j=pl.col('j').filter(pl.col('k') == 1),
k=pl.col('j').filter(pl.col('k').shift(-1).fill_null(1) == 1),
)
shape: (3, 2)
┌─────┬─────┐
│ j ┆ k │
│ --- ┆ --- │
│ i32 ┆ i32 │
╞═════╪═════╡
│ 47 ┆ 82 │
│ 19 ┆ 15 │
│ 89 ┆ 26 │
└─────┴─────┘
答案2
得分: 0
import polars
import numpy
def construct_example(seed, n):
numpy.random.seed(seed)
ks = []
js = []
expected_res = []
for i in range(n):
ntwos = numpy.random.randint(1, 4)
ks.extend([1] + [2 for j in range(ntwos)])
ijs = numpy.random.randint(10, 99, ntwos + 1)
js.extend(list(ijs))
expected_res.append((ijs[0], ijs[-1]))
df = polars.DataFrame(dict(j=js, k=ks))
return df, expected_res
def solve(df):
jarr = list(df['j']) + [None]
karr = list(df['k']) + [1]
res = []
for i, (j, k) in enumerate(zip(jarr, karr)):
if k == 1 and j is not None:
res.append((j, jarr[i+karr[i+1:].index(1)]))
return res
df, expected_res = construct_example(42, 10)
assert solve(df) == expected_res
print(list(df.iter_rows()))
print(expected_res)
解释:
construct_example
函数创建了一个包含 n
个行组的示例数据,其中每个行组中的 2 的数量可以变化,并返回相应的 polars.DataFrame
和预期的配对 expected_res
(作为元组列表)。
solve
函数接受任何满足条件的 dataframe(假设它只有所述的两个标记,没有连续的两个 1,并以 2 结尾),并按如下方式计算匹配项:
在 k=1
和 j=None
的额外行中,通过迭代行(由索引 i
索引),每当遇到 k=1
和对应的非 None
的 j
时,将 j
作为第一个元素,然后找到下一个 1
的索引(等于 i+1
加上仅考虑下面/之后的值时第一个 1
的索引),因此第二个元素对应的 j
必须位于索引 i+karr[i+1:].index(1)
处。
英文:
import polars
import numpy
def construct_example(seed, n):
numpy.random.seed(seed)
ks = []
js = []
expected_res = []
for i in range(n):
ntwos = numpy.random.randint(1, 4)
ks.extend([1] + [2 for j in range(ntwos)])
ijs = numpy.random.randint(10, 99, ntwos + 1)
js.extend(list(ijs))
expected_res.append((ijs[0], ijs[-1]))
df = polars.DataFrame(dict(j=js, k=ks))
return df, expected_res
def solve(df):
jarr = list(df['j']) + [None]
karr = list(df['k']) + [1]
res = []
for i, (j, k) in enumerate(zip(jarr, karr)):
if k == 1 and j is not None:
res.append((j, jarr[i+karr[i+1:].index(1)]))
return res
df, expected_res = construct_example(42, 10)
assert solve(df) == expected_res
print(list(df.iter_rows()))
print(expected_res)
prints
[(61, 1), (24, 2), (81, 2), (70, 2), (92, 1), (96, 2), (84, 1), (97, 2), (33, 2), (12, 2), (62, 1), (11, 2), (97, 2), (47, 1), (11, 2), (73, 2), (42, 1), (85, 2), (31, 1), (98, 2), (58, 2), (68, 1), (51, 2), (69, 2), (89, 2), (71, 1), (71, 2), (56, 2), (71, 2), (64, 1), (73, 2), (12, 2), (60, 2)]
[(61, 70), (92, 96), (84, 12), (62, 97), (47, 73), (42, 85), (31, 58), (68, 89), (71, 71), (64, 60)]
Explanation:
The function construct_example
creates example data for n
row groups, where the number of 2's per row groups can vary, and returns the corresponding polars.DataFrame
and the expected pairs expected_res
(as a list of tuples).
The function solve
takes any such dataframe (assuming it satisfies the conditions of having only the said two markers, no two consective 1's and ending in a 2) and computes the matches as follows:
Add an extra row with k=1
and j=None
, then iterate through the rows (indexed by i
) and whenever you encounter a k=1
and corresponding j
that is not None
, take the j
as the first element and then find the index of the next 1
(equal to i+1
plus the index of the first 1
when considering only values below/after), hence the corresponding j
for the second element must sit at index i+karr[i+1:].index(1)
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论