英文:
How to find the largest intersection among multiple subsets (most common element in n out of k sets)?
问题
不过是一个例子:
set_0={0,3,4}
set_1={1,3,4}
set_2={1,5,23,8,24}
set_4={1,2,6,10}
set_5={1,60,34,2}
set_6={1,45,32,4}
set_7={1,6,9,14}
set_8={1,56,3,23}
set_9={1,34,23,3}
all_intersection=set.intersection(set_0,set_1,set_2,set_3,set_4, set_5, set_6, set_7, set_8, set_9)
返回空集。有没有办法以一种Pythonic的方式找到10个集合中的任意9个集合的交集(也许不使用蛮力方法)。
对于这个数据集,我期望检索到1。
英文:
Just an example:
set_0={0,3,4}
set_1={1,3,4}
set_2={1,5,23,8,24}
set_4={1,2,6,10}
set_5={1,60,34,2}
set_6={1,45,32,4}
set_7={1,6,9,14}
set_8={1,56,3,23}
set_9={1,34,23,3}
all_intersection=set.intersection(set_0,set_1,set_2,set_3,set_4, set_5, set_6, set_7, set_8, set_9)
gives empty set. Is there any way I can find the intersection among all possible combinations of 9 out of 10 sets in a pythonic way (perhaps without the brute force approach).
For this dataset I would expect to retrieve 1.
答案1
得分: 1
尝试在set
类上调用intersection方法会导致错误,因为返回的方法是描述符。
不过,看起来你需要选择一个集合来形成交集的基础。但是,该交集不会找到最常见的值,而是告诉你哪个值至少出现在每个集合中一次。collections模块中的Counter
类可以告诉你哪些值最常见。
from collections import Counter
set_0 = {0, 3, 4}
set_1 = {1, 3, 4}
set_2 = {1, 5, 23, 8, 24}
set_4 = {1, 2, 6, 10} # 你漏掉了set_3
set_5 = {1, 60, 34, 2}
set_6 = {1, 45, 32, 4}
set_7 = {1, 6, 9, 14}
set_8 = {1, 56, 3, 23}
set_9 = {1, 34, 23, 3}
my_sets = (set_0, set_1, set_2, set_4, set_5, set_6, set_7, set_8, set_9)
values_of_interest = set().union(*my_sets)
values_shared_among_all_sets = values_of_interest.intersection(*my_sets)
counter = Counter(item for collection in my_sets for item in collection)
the_5_most_common_values = counter.most_common(5)
print(f"所有集合中的值: {values_shared_among_all_sets}")
print(f"最常见的5个值: {the_5_most_common_values}")
# 这是输出结果
所有集合中的值: set()
最常见的5个值: [(1, 8), (3, 4), (4, 3), (23, 3), (2, 2)]
英文:
Trying to call intersection on the class set
is going to lead to errors, because the returned methods are descriptors.
Though, looks like you need to choose a set from which to form a basis of intersection. But, that intersection won't find the most common value, it will tell you which value is in each set at least once. The Counter
class from collections can tell you which values are the most common.
from collections import Counter
set_0 = {0, 3, 4}
set_1 = {1, 3, 4}
set_2 = {1, 5, 23, 8, 24}
set_4 = {1, 2, 6, 10} # you're missing set_3
set_5 = {1, 60, 34, 2}
set_6 = {1, 45, 32, 4}
set_7 = {1, 6, 9, 14}
set_8 = {1, 56, 3, 23}
set_9 = {1, 34, 23, 3}
my_sets = (set_0, set_1, set_2, set_4, set_5, set_6, set_7, set_8, set_9)
values_of_interest = set().union(*my_sets)
values_shared_among_all_sets = values_of_interest.intersection(*my_sets)
counter = Counter(item for collection in my_sets for item in collection)
the_5_most_common_values = counter.most_common(5)
print(f"values in all sets: {values_shared_among_all_sets}")
print(f"most common 5 values: {the_5_most_common_values}")
# this is the output
values in all sets: set()
most common 5 values: [(1, 8), (3, 4), (4, 3), (23, 3), (2, 2)]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论