Pythonic way to group values from a list based on values from another list

huangapple go评论55阅读模式
英文:

Pythonic way to group values from a list based on values from another list

问题

有两个列表:

List_A = [1, 25, 40]
List_B = [2, 19, 23, 26, 30, 32, 34, 36]

我想生成一个列表的列表,通过确定它们是否在列表A的值之间,将列表B中的值分组。所以在这个例子中,列表B将被分成:

[[2,19,23], [26,30,32,34,36]]

在Python中有没有一种干净的方法来实现这一点,而不使用多层嵌套的for循环?

尝试了一个混乱的双层嵌套循环结构,对它的臃肿程度感到不满(由于可读性差)。

英文:

I have 2 lists:

List_A = [1, 25, 40]
List_B = [2, 19, 23, 26, 30, 32, 34, 36]

I want to generate a list of lists such that I group values in list B by determining if they are in between values in list A. So in this example, list B would be grouped into:

[[2,19,23], [26,30,32,34,36]]

Is there any clean way in python to achieve this without multiple nested for loops?

Tried a messy double nested loop structure, was not pleased with how clunky it was (due to lack of readability).

答案1

得分: 0

这是我能想到的编写代码的最简单方法。

result = []
for start, end in zip(List_A, List_A[1:]):
    result.append([i for i in List_B if start <= i < end])

这是O(NxM)的时间复杂度,所以对于大型列表来说效率不是很高。

如果对List_B进行排序(我假设List_A已经排序),并同时遍历它们,可以使代码更高效,但会更加复杂。

英文:

This is the simplest way I can think of to code it.

result = []
for start, end in zip(List_A, List_A[1:]):
    result.append([i for i in List_B if start &lt;= i &lt; end])

It's O(NxM), so not very efficient for large lists.

You could make it more efficient by sorting List_B (I assume List_A is already sorted) and stepping through both of them together, but it will be more complicated.

答案2

得分: 0

根据插入到List_A中的索引,将List_B分组。标准库提供了bisect模块中的功能,通过使用标准的二分算法来确定值应该放在哪里;它还在itertools模块中提供了根据某个谓词("key"函数)将输入序列中的相邻值分组的功能。

代码示例如下:

from bisect import bisect
from itertools import groupby

List_A = [1, 25, 40]
List_B = [2, 19, 23, 26, 30, 32, 34, 36]
groups = groupby(List_B, key=lambda x: bisect(List_A, x))
print([list(group) for key, group in groups])

这将得到所请求的结果:[[2, 19, 23], [26, 30, 32, 34, 36]]

bisect.bisectbisect.bisect_right 的别名;也就是说,List_B 中与 List_A 中的值相等的值将放在稍后列表的开头。如果要将其放在前一个列表的末尾,可以使用 bisect.bisect_left

bisect.bisect 也依赖于List_A自然排序。

itertools.groupby 将分组相邻的值;对于属于同一“bin”但被不同“bin”的值分开的值,它将创建单独的组。如果这是一个问题,请先对输入进行排序。

这将具有时间复杂度 O(N * lg M),其中 N 是 List_B 的长度,M 是 List_A 的长度。也就是说:在箱的数量中找到一个箱的时间复杂度为对数级别,这个工作对每个要进行分组的值都会重复进行。

这不会生成空列表,如果有一个应该是空的箱;在这个示例中,列表推导式会忽略对 List_A 的实际索引。

英文:

Group List_B according to the index that they would have, if inserted into List_A. The standard library provides functionality in the bisect module to figure out (by using a standard bisection algorithm) where the value would go; it provides functionality in the itertools module to group adjacent values in an input sequence, according to some predicate ("key" function).

This looks like:

from bisect import bisect
from itertools import groupby

List_A = [1, 25, 40]
List_B = [2, 19, 23, 26, 30, 32, 34, 36]
groups = groupby(List_B, key=lambda x: bisect(List_A, x))
print([list(group) for key, group in groups])

which gives [[2, 19, 23], [26, 30, 32, 34, 36]] as requested.

bisect.bisect is an alias for bisect.bisect_right; that is, a value in List_B that is equal to a value from List_A will be put at the beginning of a later list. To have it as the end of the previous list instead, use bisect.bisect_left.

bisect.bisect also relies on List_A being sorted, naturally.

itertools.groupby will group adjacent values; it will make separate groups for values that belong in the same "bin" but are separated by values that belong in a different "bin". If this is an issue, sort the input first.

This will be O(N * lg M) where N is the length of List_B and M is the length of List_A. That is: finding a bin takes logarithmic time in the number of bins, and this work is repeated for each value to be binned.

This will not generate empty lists if there is a bin that should be empty; the actual indices into List_A are ignored by the list comprehension in this example.

huangapple
  • 本文由 发表于 2023年2月16日 07:14:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75466296.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定