在polars中查找匹配的对,并将它们按列排列。

huangapple go评论142阅读模式
英文:

Find matching pairs and lay them out as columns in polars

问题

假设你已经导入了polarsnumpy库,你可以按照以下步骤来实现你的需求:

  1. 首先,使用polars.DataFrame创建一个DataFrame对象df,其中包含列jk的数据。
  1. import polars as pl
  2. import numpy as np
  3. df = pl.DataFrame({
  4. 'j': np.random.randint(10, 99, 9),
  5. 'k': np.tile([1, 2, 2], 3)
  6. })
  1. 接下来,使用groupby函数按照列k进行分组,并使用sort函数按照列kj进行排序。
  1. grouped = df.groupby('k').sort(by=['k', 'j'])
  1. 然后,使用shift函数创建一个新的列k_shifted,其中存储了列k向下偏移一行的值。
  1. grouped = grouped.with_column(pl.col('k').shift(-1).alias('k_shifted'))
  1. 使用filter函数筛选出k=1k_shifted=2的行。
  1. filtered = grouped.filter((pl.col('k') == 1) & (pl.col('k_shifted') == 2))
  1. 最后,使用select函数选择列jk_shifted,并将结果存储在一个新的DataFrame对象result中。
  1. result = filtered.select(['j', 'k_shifted'])

现在,result中包含了满足条件的j和对应的最后一个k=2的值。

英文:

Say I have this:

  1. df = polars.DataFrame(dict(
  2. j=numpy.random.randint(10, 99, 9),
  3. k=numpy.tile([1, 2, 2], 3),
  4. ))
  5. j (i64) k (i64)
  6. 47 1
  7. 22 2
  8. 82 2
  9. 19 1
  10. 85 2
  11. 15 2
  12. 89 1
  13. 74 2
  14. 26 2
  15. shape: (9, 2)

where column k is kind of a marker - 1 starts and then there are one or more 2s (in the above example always two for simplicity, but in practice one or more). I'd like to get values in j that correspond to k=1 and the last corresponding k=2. For the above:

  1. j (i64) k (i64)
  2. 47 1 >-\
  3. 22 2 | these are the 1 and the last of its matching 2s
  4. 82 2 <-/
  5. 19 1 >-\
  6. 85 2 | these are the 1 and the last of its matching 2s
  7. 15 2 <-/
  8. 89 1 >-\
  9. 74 2 | these are the 1 and the last of its matching 2s
  10. 26 2 <-/
  11. shape: (9, 2)

and I'd like to put these in two columns, so I get this:

  1. j (i64) k (i64)
  2. 47 82
  3. 19 15
  4. 89 26
  5. shape: (9, 2)

How would I approach this in polars?

答案1

得分: 2

你可以通过查找k=1或者下一个k(例如shift)为1来进行简单的filter

  1. df.select(
  2. j=pl.col('j').filter(pl.col('k') == 1),
  3. k=pl.col('j').filter(pl.col('k').shift(-1).fill_null(1) == 1),
  4. )
  1. shape: (3, 2)
  2. ┌─────┬─────┐
  3. j k
  4. --- ---
  5. i32 i32
  6. ╞═════╪═════╡
  7. 47 82
  8. 19 15
  9. 89 26
  10. └─────┴─────┘
英文:

You can filter simply by looking for k=1 or when the next k, e.g. a shift, is 1:

  1. df.select(
  2. j=pl.col('j').filter(pl.col('k') == 1),
  3. k=pl.col('j').filter(pl.col('k').shift(-1).fill_null(1) == 1),
  4. )
  1. shape: (3, 2)
  2. ┌─────┬─────┐
  3. j k
  4. --- ---
  5. i32 i32
  6. ╞═════╪═════╡
  7. 47 82
  8. 19 15
  9. 89 26
  10. └─────┴─────┘

答案2

得分: 0

  1. import polars
  2. import numpy
  3. def construct_example(seed, n):
  4. numpy.random.seed(seed)
  5. ks = []
  6. js = []
  7. expected_res = []
  8. for i in range(n):
  9. ntwos = numpy.random.randint(1, 4)
  10. ks.extend([1] + [2 for j in range(ntwos)])
  11. ijs = numpy.random.randint(10, 99, ntwos + 1)
  12. js.extend(list(ijs))
  13. expected_res.append((ijs[0], ijs[-1]))
  14. df = polars.DataFrame(dict(j=js, k=ks))
  15. return df, expected_res
  16. def solve(df):
  17. jarr = list(df['j']) + [None]
  18. karr = list(df['k']) + [1]
  19. res = []
  20. for i, (j, k) in enumerate(zip(jarr, karr)):
  21. if k == 1 and j is not None:
  22. res.append((j, jarr[i+karr[i+1:].index(1)]))
  23. return res
  24. df, expected_res = construct_example(42, 10)
  25. assert solve(df) == expected_res
  26. print(list(df.iter_rows()))
  27. print(expected_res)

解释:

construct_example 函数创建了一个包含 n 个行组的示例数据,其中每个行组中的 2 的数量可以变化,并返回相应的 polars.DataFrame 和预期的配对 expected_res(作为元组列表)。

solve 函数接受任何满足条件的 dataframe(假设它只有所述的两个标记,没有连续的两个 1,并以 2 结尾),并按如下方式计算匹配项:

k=1j=None 的额外行中,通过迭代行(由索引 i 索引),每当遇到 k=1 和对应的非 Nonej 时,将 j 作为第一个元素,然后找到下一个 1 的索引(等于 i+1 加上仅考虑下面/之后的值时第一个 1 的索引),因此第二个元素对应的 j 必须位于索引 i+karr[i+1:].index(1) 处。

英文:
  1. import polars
  2. import numpy
  3. def construct_example(seed, n):
  4. numpy.random.seed(seed)
  5. ks = []
  6. js = []
  7. expected_res = []
  8. for i in range(n):
  9. ntwos = numpy.random.randint(1, 4)
  10. ks.extend([1] + [2 for j in range(ntwos)])
  11. ijs = numpy.random.randint(10, 99, ntwos + 1)
  12. js.extend(list(ijs))
  13. expected_res.append((ijs[0], ijs[-1]))
  14. df = polars.DataFrame(dict(j=js, k=ks))
  15. return df, expected_res
  16. def solve(df):
  17. jarr = list(df['j']) + [None]
  18. karr = list(df['k']) + [1]
  19. res = []
  20. for i, (j, k) in enumerate(zip(jarr, karr)):
  21. if k == 1 and j is not None:
  22. res.append((j, jarr[i+karr[i+1:].index(1)]))
  23. return res
  24. df, expected_res = construct_example(42, 10)
  25. assert solve(df) == expected_res
  26. print(list(df.iter_rows()))
  27. print(expected_res)

prints

  1. [(61, 1), (24, 2), (81, 2), (70, 2), (92, 1), (96, 2), (84, 1), (97, 2), (33, 2), (12, 2), (62, 1), (11, 2), (97, 2), (47, 1), (11, 2), (73, 2), (42, 1), (85, 2), (31, 1), (98, 2), (58, 2), (68, 1), (51, 2), (69, 2), (89, 2), (71, 1), (71, 2), (56, 2), (71, 2), (64, 1), (73, 2), (12, 2), (60, 2)]
  2. [(61, 70), (92, 96), (84, 12), (62, 97), (47, 73), (42, 85), (31, 58), (68, 89), (71, 71), (64, 60)]

Explanation:

The function construct_example creates example data for n row groups, where the number of 2's per row groups can vary, and returns the corresponding polars.DataFrame and the expected pairs expected_res (as a list of tuples).

The function solve takes any such dataframe (assuming it satisfies the conditions of having only the said two markers, no two consective 1's and ending in a 2) and computes the matches as follows:

Add an extra row with k=1 and j=None, then iterate through the rows (indexed by i) and whenever you encounter a k=1 and corresponding j that is not None, take the j as the first element and then find the index of the next 1 (equal to i+1 plus the index of the first 1 when considering only values below/after), hence the corresponding j for the second element must sit at index i+karr[i+1:].index(1).

huangapple
  • 本文由 发表于 2023年8月9日 06:37:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76863562.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定