在polars中查找匹配的对,并将它们按列排列。

huangapple go评论104阅读模式
英文:

Find matching pairs and lay them out as columns in polars

问题

假设你已经导入了polarsnumpy库,你可以按照以下步骤来实现你的需求:

  1. 首先,使用polars.DataFrame创建一个DataFrame对象df,其中包含列jk的数据。
import polars as pl
import numpy as np

df = pl.DataFrame({
  'j': np.random.randint(10, 99, 9),
  'k': np.tile([1, 2, 2], 3)
})
  1. 接下来,使用groupby函数按照列k进行分组,并使用sort函数按照列kj进行排序。
grouped = df.groupby('k').sort(by=['k', 'j'])
  1. 然后,使用shift函数创建一个新的列k_shifted,其中存储了列k向下偏移一行的值。
grouped = grouped.with_column(pl.col('k').shift(-1).alias('k_shifted'))
  1. 使用filter函数筛选出k=1k_shifted=2的行。
filtered = grouped.filter((pl.col('k') == 1) & (pl.col('k_shifted') == 2))
  1. 最后,使用select函数选择列jk_shifted,并将结果存储在一个新的DataFrame对象result中。
result = filtered.select(['j', 'k_shifted'])

现在,result中包含了满足条件的j和对应的最后一个k=2的值。

英文:

Say I have this:

df = polars.DataFrame(dict(
  j=numpy.random.randint(10, 99, 9),
  k=numpy.tile([1, 2, 2], 3),
  ))
  
 j (i64)  k (i64)
 47       1
 22       2
 82       2
 19       1
 85       2
 15       2
 89       1
 74       2
 26       2
shape: (9, 2)

where column k is kind of a marker - 1 starts and then there are one or more 2s (in the above example always two for simplicity, but in practice one or more). I'd like to get values in j that correspond to k=1 and the last corresponding k=2. For the above:

 j (i64)  k (i64)
 47       1 >-\
 22       2   | these are the 1 and the last of its matching 2s
 82       2 <-/
 19       1 >-\
 85       2   | these are the 1 and the last of its matching 2s
 15       2 <-/
 89       1 >-\
 74       2   | these are the 1 and the last of its matching 2s
 26       2 <-/
shape: (9, 2)

and I'd like to put these in two columns, so I get this:

 j (i64)  k (i64)
 47       82
 19       15
 89       26
shape: (9, 2)

How would I approach this in polars?

答案1

得分: 2

你可以通过查找k=1或者下一个k(例如shift)为1来进行简单的filter

df.select(
    j=pl.col('j').filter(pl.col('k') == 1),
    k=pl.col('j').filter(pl.col('k').shift(-1).fill_null(1) == 1),
)
shape: (3, 2)
┌─────┬─────┐
 j    k   
 ---  --- 
 i32  i32 
╞═════╪═════╡
 47   82  
 19   15  
 89   26  
└─────┴─────┘
英文:

You can filter simply by looking for k=1 or when the next k, e.g. a shift, is 1:

df.select(
    j=pl.col('j').filter(pl.col('k') == 1),
    k=pl.col('j').filter(pl.col('k').shift(-1).fill_null(1) == 1),
)
shape: (3, 2)
┌─────┬─────┐
 j    k   
 ---  --- 
 i32  i32 
╞═════╪═════╡
 47   82  
 19   15  
 89   26  
└─────┴─────┘

答案2

得分: 0

import polars
import numpy

def construct_example(seed, n):
    numpy.random.seed(seed)
    ks = []
    js = []
    expected_res = []
    for i in range(n):
        ntwos = numpy.random.randint(1, 4)
        ks.extend([1] + [2 for j in range(ntwos)])
        ijs = numpy.random.randint(10, 99, ntwos + 1)
        js.extend(list(ijs))
        expected_res.append((ijs[0], ijs[-1]))
    df = polars.DataFrame(dict(j=js, k=ks))
    return df, expected_res

def solve(df):
    jarr = list(df['j']) + [None]
    karr = list(df['k']) + [1]
    res = []
    for i, (j, k) in enumerate(zip(jarr, karr)):
        if k == 1 and j is not None:
            res.append((j, jarr[i+karr[i+1:].index(1)]))
    return res

df, expected_res = construct_example(42, 10)

assert solve(df) == expected_res

print(list(df.iter_rows()))
print(expected_res)

解释:

construct_example 函数创建了一个包含 n 个行组的示例数据,其中每个行组中的 2 的数量可以变化,并返回相应的 polars.DataFrame 和预期的配对 expected_res(作为元组列表)。

solve 函数接受任何满足条件的 dataframe(假设它只有所述的两个标记,没有连续的两个 1,并以 2 结尾),并按如下方式计算匹配项:

k=1j=None 的额外行中,通过迭代行(由索引 i 索引),每当遇到 k=1 和对应的非 Nonej 时,将 j 作为第一个元素,然后找到下一个 1 的索引(等于 i+1 加上仅考虑下面/之后的值时第一个 1 的索引),因此第二个元素对应的 j 必须位于索引 i+karr[i+1:].index(1) 处。

英文:
import polars
import numpy

def construct_example(seed, n):
    numpy.random.seed(seed)
    ks = []
    js = []
    expected_res = []
    for i in range(n):
        ntwos = numpy.random.randint(1, 4)
        ks.extend([1] + [2 for j in range(ntwos)])
        ijs = numpy.random.randint(10, 99, ntwos + 1)
        js.extend(list(ijs))
        expected_res.append((ijs[0], ijs[-1]))
    df = polars.DataFrame(dict(j=js, k=ks))
    return df, expected_res

def solve(df):
    jarr = list(df['j']) + [None]
    karr = list(df['k']) + [1]
    res = []
    for i, (j, k) in enumerate(zip(jarr, karr)):
        if k == 1 and j is not None:
            res.append((j, jarr[i+karr[i+1:].index(1)]))
    return res

df, expected_res = construct_example(42, 10)

assert solve(df) == expected_res

print(list(df.iter_rows()))
print(expected_res)

prints

[(61, 1), (24, 2), (81, 2), (70, 2), (92, 1), (96, 2), (84, 1), (97, 2), (33, 2), (12, 2), (62, 1), (11, 2), (97, 2), (47, 1), (11, 2), (73, 2), (42, 1), (85, 2), (31, 1), (98, 2), (58, 2), (68, 1), (51, 2), (69, 2), (89, 2), (71, 1), (71, 2), (56, 2), (71, 2), (64, 1), (73, 2), (12, 2), (60, 2)]
[(61, 70), (92, 96), (84, 12), (62, 97), (47, 73), (42, 85), (31, 58), (68, 89), (71, 71), (64, 60)]

Explanation:

The function construct_example creates example data for n row groups, where the number of 2's per row groups can vary, and returns the corresponding polars.DataFrame and the expected pairs expected_res (as a list of tuples).

The function solve takes any such dataframe (assuming it satisfies the conditions of having only the said two markers, no two consective 1's and ending in a 2) and computes the matches as follows:

Add an extra row with k=1 and j=None, then iterate through the rows (indexed by i) and whenever you encounter a k=1 and corresponding j that is not None, take the j as the first element and then find the index of the next 1 (equal to i+1 plus the index of the first 1 when considering only values below/after), hence the corresponding j for the second element must sit at index i+karr[i+1:].index(1).

huangapple
  • 本文由 发表于 2023年8月9日 06:37:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76863562.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定