2023年8月9日 06:37:57go评论104阅读模式

英文:

Find matching pairs and lay them out as columns in polars

问题

假设你已经导入了polars和numpy库，你可以按照以下步骤来实现你的需求：

首先，使用polars.DataFrame创建一个DataFrame对象df，其中包含列j和k的数据。

import polars as pl
import numpy as np

df = pl.DataFrame({
  'j': np.random.randint(10, 99, 9),
  'k': np.tile([1, 2, 2], 3)
})

接下来，使用groupby函数按照列k进行分组，并使用sort函数按照列k和j进行排序。

grouped = df.groupby('k').sort(by=['k', 'j'])

然后，使用shift函数创建一个新的列k_shifted，其中存储了列k向下偏移一行的值。

grouped = grouped.with_column(pl.col('k').shift(-1).alias('k_shifted'))

使用filter函数筛选出k=1且k_shifted=2的行。

filtered = grouped.filter((pl.col('k') == 1) & (pl.col('k_shifted') == 2))

最后，使用select函数选择列j和k_shifted，并将结果存储在一个新的DataFrame对象result中。

result = filtered.select(['j', 'k_shifted'])

现在，result中包含了满足条件的j和对应的最后一个k=2的值。

英文:

Say I have this:

df = polars.DataFrame(dict(
  j=numpy.random.randint(10, 99, 9),
  k=numpy.tile([1, 2, 2], 3),
  ))
  
 j (i64)  k (i64)
 47       1
 22       2
 82       2
 19       1
 85       2
 15       2
 89       1
 74       2
 26       2
shape: (9, 2)

where column k is kind of a marker - 1 starts and then there are one or more 2s (in the above example always two for simplicity, but in practice one or more). I'd like to get values in j that correspond to k=1 and the last corresponding k=2. For the above:

 j (i64)  k (i64)
 47       1 &gt;-\
 22       2   | these are the 1 and the last of its matching 2s
 82       2 &lt;-/
 19       1 &gt;-\
 85       2   | these are the 1 and the last of its matching 2s
 15       2 &lt;-/
 89       1 &gt;-\
 74       2   | these are the 1 and the last of its matching 2s
 26       2 &lt;-/
shape: (9, 2)

and I'd like to put these in two columns, so I get this:

 j (i64)  k (i64)
 47       82
 19       15
 89       26
shape: (9, 2)

How would I approach this in polars?

答案1

得分: 2

你可以通过查找k=1或者下一个k（例如shift）为1来进行简单的filter：

df.select(
    j=pl.col('j').filter(pl.col('k') == 1),
    k=pl.col('j').filter(pl.col('k').shift(-1).fill_null(1) == 1),
)

shape: (3, 2)
┌─────┬─────┐
│ j   ┆ k   │
│ --- ┆ --- │
│ i32 ┆ i32 │
╞═════╪═════╡
│ 47  ┆ 82  │
│ 19  ┆ 15  │
│ 89  ┆ 26  │
└─────┴─────┘

英文:

You can filter simply by looking for k=1 or when the next k, e.g. a shift, is 1:

df.select(
    j=pl.col(&#39;j&#39;).filter(pl.col(&#39;k&#39;) == 1),
    k=pl.col(&#39;j&#39;).filter(pl.col(&#39;k&#39;).shift(-1).fill_null(1) == 1),
)

shape: (3, 2)
┌─────┬─────┐
│ j   ┆ k   │
│ --- ┆ --- │
│ i32 ┆ i32 │
╞═════╪═════╡
│ 47  ┆ 82  │
│ 19  ┆ 15  │
│ 89  ┆ 26  │
└─────┴─────┘

答案2

得分: 0

import polars
import numpy

def construct_example(seed, n):
    numpy.random.seed(seed)
    ks = []
    js = []
    expected_res = []
    for i in range(n):
        ntwos = numpy.random.randint(1, 4)
        ks.extend([1] + [2 for j in range(ntwos)])
        ijs = numpy.random.randint(10, 99, ntwos + 1)
        js.extend(list(ijs))
        expected_res.append((ijs[0], ijs[-1]))
    df = polars.DataFrame(dict(j=js, k=ks))
    return df, expected_res

def solve(df):
    jarr = list(df['j']) + [None]
    karr = list(df['k']) + [1]
    res = []
    for i, (j, k) in enumerate(zip(jarr, karr)):
        if k == 1 and j is not None:
            res.append((j, jarr[i+karr[i+1:].index(1)]))
    return res

df, expected_res = construct_example(42, 10)

assert solve(df) == expected_res

print(list(df.iter_rows()))
print(expected_res)

解释：

construct_example 函数创建了一个包含 n 个行组的示例数据，其中每个行组中的 2 的数量可以变化，并返回相应的 polars.DataFrame 和预期的配对 expected_res（作为元组列表）。

solve 函数接受任何满足条件的 dataframe（假设它只有所述的两个标记，没有连续的两个 1，并以 2 结尾），并按如下方式计算匹配项：

在 k=1 和 j=None 的额外行中，通过迭代行（由索引 i 索引），每当遇到 k=1 和对应的非 None 的 j 时，将 j 作为第一个元素，然后找到下一个 1 的索引（等于 i+1 加上仅考虑下面/之后的值时第一个 1 的索引），因此第二个元素对应的 j 必须位于索引 i+karr[i+1:].index(1) 处。

英文:

import polars
import numpy

def construct_example(seed, n):
    numpy.random.seed(seed)
    ks = []
    js = []
    expected_res = []
    for i in range(n):
        ntwos = numpy.random.randint(1, 4)
        ks.extend([1] + [2 for j in range(ntwos)])
        ijs = numpy.random.randint(10, 99, ntwos + 1)
        js.extend(list(ijs))
        expected_res.append((ijs[0], ijs[-1]))
    df = polars.DataFrame(dict(j=js, k=ks))
    return df, expected_res

def solve(df):
    jarr = list(df[&#39;j&#39;]) + [None]
    karr = list(df[&#39;k&#39;]) + [1]
    res = []
    for i, (j, k) in enumerate(zip(jarr, karr)):
        if k == 1 and j is not None:
            res.append((j, jarr[i+karr[i+1:].index(1)]))
    return res

df, expected_res = construct_example(42, 10)

assert solve(df) == expected_res

print(list(df.iter_rows()))
print(expected_res)

prints

[(61, 1), (24, 2), (81, 2), (70, 2), (92, 1), (96, 2), (84, 1), (97, 2), (33, 2), (12, 2), (62, 1), (11, 2), (97, 2), (47, 1), (11, 2), (73, 2), (42, 1), (85, 2), (31, 1), (98, 2), (58, 2), (68, 1), (51, 2), (69, 2), (89, 2), (71, 1), (71, 2), (56, 2), (71, 2), (64, 1), (73, 2), (12, 2), (60, 2)]
[(61, 70), (92, 96), (84, 12), (62, 97), (47, 73), (42, 85), (31, 58), (68, 89), (71, 71), (64, 60)]

Explanation:

The function construct_example creates example data for n row groups, where the number of 2's per row groups can vary, and returns the corresponding polars.DataFrame and the expected pairs expected_res (as a list of tuples).

The function solve takes any such dataframe (assuming it satisfies the conditions of having only the said two markers, no two consective 1's and ending in a 2) and computes the matches as follows:

Add an extra row with k=1 and j=None, then iterate through the rows (indexed by i) and whenever you encounter a k=1 and corresponding j that is not None, take the j as the first element and then find the index of the next 1 (equal to i+1 plus the index of the first 1 when considering only values below/after), hence the corresponding j for the second element must sit at index i+karr[i+1:].index(1).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在polars中查找匹配的对，并将它们按列排列。

问题

答案1

答案2

当从Go应用程序向Python应用程序的stdin写入时出现”Broken Pipe”错误。

在笛卡尔平面上移动的概率

寻找使黑盒模型返回最大输出的最佳输入组合。

如何识别XPath而不使用索引？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论