如何找到多个子集中的最大交集(k个集合中的n个元素中最常见的元素)?

huangapple go评论63阅读模式
英文:

How to find the largest intersection among multiple subsets (most common element in n out of k sets)?

问题

不过是一个例子:

set_0={0,3,4}
set_1={1,3,4}
set_2={1,5,23,8,24}
set_4={1,2,6,10}
set_5={1,60,34,2}
set_6={1,45,32,4}
set_7={1,6,9,14}
set_8={1,56,3,23}
set_9={1,34,23,3}
all_intersection=set.intersection(set_0,set_1,set_2,set_3,set_4, set_5, set_6, set_7, set_8, set_9)

返回空集。有没有办法以一种Pythonic的方式找到10个集合中的任意9个集合的交集(也许不使用蛮力方法)。

对于这个数据集,我期望检索到1。

英文:

Just an example:

set_0={0,3,4}
set_1={1,3,4}
set_2={1,5,23,8,24}
set_4={1,2,6,10}
set_5={1,60,34,2}
set_6={1,45,32,4}
set_7={1,6,9,14}
set_8={1,56,3,23}
set_9={1,34,23,3}
all_intersection=set.intersection(set_0,set_1,set_2,set_3,set_4, set_5, set_6, set_7, set_8, set_9)

gives empty set. Is there any way I can find the intersection among all possible combinations of 9 out of 10 sets in a pythonic way (perhaps without the brute force approach).

For this dataset I would expect to retrieve 1.

答案1

得分: 1

尝试在set类上调用intersection方法会导致错误,因为返回的方法是描述符。

不过,看起来你需要选择一个集合来形成交集的基础。但是,该交集不会找到最常见的值,而是告诉你哪个值至少出现在每个集合中一次。collections模块中的Counter类可以告诉你哪些值最常见。

from collections import Counter

set_0 = {0, 3, 4}
set_1 = {1, 3, 4}
set_2 = {1, 5, 23, 8, 24}
set_4 = {1, 2, 6, 10}  # 你漏掉了set_3
set_5 = {1, 60, 34, 2}
set_6 = {1, 45, 32, 4}
set_7 = {1, 6, 9, 14}
set_8 = {1, 56, 3, 23}
set_9 = {1, 34, 23, 3}

my_sets = (set_0, set_1, set_2, set_4, set_5, set_6, set_7, set_8, set_9)
values_of_interest = set().union(*my_sets)

values_shared_among_all_sets = values_of_interest.intersection(*my_sets)
counter = Counter(item for collection in my_sets for item in collection)
the_5_most_common_values = counter.most_common(5)

print(f"所有集合中的值: {values_shared_among_all_sets}")
print(f"最常见的5个值: {the_5_most_common_values}")

# 这是输出结果
所有集合中的值: set()
最常见的5个值: [(1, 8), (3, 4), (4, 3), (23, 3), (2, 2)]
英文:

Trying to call intersection on the class set is going to lead to errors, because the returned methods are descriptors.

Though, looks like you need to choose a set from which to form a basis of intersection. But, that intersection won't find the most common value, it will tell you which value is in each set at least once. The Counter class from collections can tell you which values are the most common.

from collections import Counter

set_0 = {0, 3, 4}
set_1 = {1, 3, 4}
set_2 = {1, 5, 23, 8, 24}
set_4 = {1, 2, 6, 10}  # you're missing set_3
set_5 = {1, 60, 34, 2}
set_6 = {1, 45, 32, 4}
set_7 = {1, 6, 9, 14}
set_8 = {1, 56, 3, 23}
set_9 = {1, 34, 23, 3}

my_sets = (set_0, set_1, set_2, set_4, set_5, set_6, set_7, set_8, set_9)
values_of_interest = set().union(*my_sets)

values_shared_among_all_sets = values_of_interest.intersection(*my_sets)
counter = Counter(item for collection in my_sets for item in collection)
the_5_most_common_values = counter.most_common(5)

print(f"values in all sets: {values_shared_among_all_sets}")
print(f"most common 5 values: {the_5_most_common_values}")

# this is the output
values in all sets: set()
most common 5 values: [(1, 8), (3, 4), (4, 3), (23, 3), (2, 2)]

huangapple
  • 本文由 发表于 2023年2月8日 18:28:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75384429.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定