2023年2月16日 07:14:34go评论59阅读模式

英文:

Pythonic way to group values from a list based on values from another list

问题

有两个列表：

List_A = [1, 25, 40]
List_B = [2, 19, 23, 26, 30, 32, 34, 36]

我想生成一个列表的列表，通过确定它们是否在列表A的值之间，将列表B中的值分组。所以在这个例子中，列表B将被分成：

[[2,19,23], [26,30,32,34,36]]

在Python中有没有一种干净的方法来实现这一点，而不使用多层嵌套的for循环？

尝试了一个混乱的双层嵌套循环结构，对它的臃肿程度感到不满（由于可读性差）。

英文:

I have 2 lists:

List_A = [1, 25, 40]
List_B = [2, 19, 23, 26, 30, 32, 34, 36]

I want to generate a list of lists such that I group values in list B by determining if they are in between values in list A. So in this example, list B would be grouped into:

[[2,19,23], [26,30,32,34,36]]

Is there any clean way in python to achieve this without multiple nested for loops?

Tried a messy double nested loop structure, was not pleased with how clunky it was (due to lack of readability).

答案1

得分: 0

这是我能想到的编写代码的最简单方法。

result = []
for start, end in zip(List_A, List_A[1:]):
    result.append([i for i in List_B if start <= i < end])

这是O(NxM)的时间复杂度，所以对于大型列表来说效率不是很高。

如果对List_B进行排序（我假设List_A已经排序），并同时遍历它们，可以使代码更高效，但会更加复杂。

英文:

This is the simplest way I can think of to code it.

result = []
for start, end in zip(List_A, List_A[1:]):
    result.append([i for i in List_B if start &lt;= i &lt; end])

It's O(NxM), so not very efficient for large lists.

You could make it more efficient by sorting List_B (I assume List_A is already sorted) and stepping through both of them together, but it will be more complicated.

答案2

得分: 0

根据插入到List_A中的索引，将List_B分组。标准库提供了bisect模块中的功能，通过使用标准的二分算法来确定值应该放在哪里；它还在itertools模块中提供了根据某个谓词（"key"函数）将输入序列中的相邻值分组的功能。

代码示例如下：

from bisect import bisect
from itertools import groupby

List_A = [1, 25, 40]
List_B = [2, 19, 23, 26, 30, 32, 34, 36]
groups = groupby(List_B, key=lambda x: bisect(List_A, x))
print([list(group) for key, group in groups])

这将得到所请求的结果：[[2, 19, 23], [26, 30, 32, 34, 36]]。

bisect.bisect 是 bisect.bisect_right 的别名；也就是说，List_B 中与 List_A 中的值相等的值将放在稍后列表的开头。如果要将其放在前一个列表的末尾，可以使用 bisect.bisect_left。

bisect.bisect 也依赖于List_A自然排序。

itertools.groupby 将分组相邻的值；对于属于同一“bin”但被不同“bin”的值分开的值，它将创建单独的组。如果这是一个问题，请先对输入进行排序。

这将具有时间复杂度 O(N * lg M)，其中 N 是 List_B 的长度，M 是 List_A 的长度。也就是说：在箱的数量中找到一个箱的时间复杂度为对数级别，这个工作对每个要进行分组的值都会重复进行。

这不会生成空列表，如果有一个应该是空的箱；在这个示例中，列表推导式会忽略对 List_A 的实际索引。

英文:

Group List_B according to the index that they would have, if inserted into List_A. The standard library provides functionality in the bisect module to figure out (by using a standard bisection algorithm) where the value would go; it provides functionality in the itertools module to group adjacent values in an input sequence, according to some predicate ("key" function).

This looks like:

from bisect import bisect
from itertools import groupby

List_A = [1, 25, 40]
List_B = [2, 19, 23, 26, 30, 32, 34, 36]
groups = groupby(List_B, key=lambda x: bisect(List_A, x))
print([list(group) for key, group in groups])

which gives [[2, 19, 23], [26, 30, 32, 34, 36]] as requested.

bisect.bisect is an alias for bisect.bisect_right; that is, a value in List_B that is equal to a value from List_A will be put at the beginning of a later list. To have it as the end of the previous list instead, use bisect.bisect_left.

bisect.bisect also relies on List_A being sorted, naturally.

itertools.groupby will group adjacent values; it will make separate groups for values that belong in the same "bin" but are separated by values that belong in a different "bin". If this is an issue, sort the input first.

This will be O(N * lg M) where N is the length of List_B and M is the length of List_A. That is: finding a bin takes logarithmic time in the number of bins, and this work is repeated for each value to be binned.

This will not generate empty lists if there is a bin that should be empty; the actual indices into List_A are ignored by the list comprehension in this example.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pythonic way to group values from a list based on values from another list

问题

答案1

答案2

按照 pandas 数据框中的列进行分组和聚合。

使用pandas的assign方法中的random模块为什么返回相同的数字？

将DataFrame从一行排列到多列中的Python代码。

在删除重复序列的fasta文件时出现错误 – 标头存在问题

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论