Pandas:具有最高覆盖率的组合

huangapple go评论58阅读模式
英文:

Pandas: combination with highest coverage

问题

在超市里,我选择了30种产品,我们想要进行分析。我想要看到其中的12种产品在特定日期(不涉及时间)内覆盖了最多的客户。

这意味着我有30!/(12!(30-12)!) = 86493225 种产品组合。

我的pandas客户购买数据框:

客户 产品
A 香蕉
B 苹果
B 香蕉
C
...

现在我可以迭代查看具有最多客户数量的组合,首先使用itertools创建它们:

comb = set(itertools.combinations([香蕉,...], 12))
d = {}
for i in comb:
    d[i] = df[df.product.isin(i)].Client.nunique()

但这将花费大量的时间。

你们有没有看到更好的方法来计算这个?

请注意,我不想找到12种最常见的产品的组合,而是能够产生最多客户的组合(可能相同,但不一定,因为2种不常见的产品如果两个组不重叠,可能会产生更多客户)。

有什么想法吗?

谢谢

英文:

In a supermarket I selected 30 products on which we want to run an analysis. I want to see which 12 of them give me the widest coverage of clients (on a specific date, no time involved).

This means that I have 30!/(12!(30-12)!) = 86493225 combinations of products

My pandas dataframe of clients purchases:

Client Product
A Banana
B Apple
B Banana
C Water
...

now I could iterate to see the combination of highest client count, creating them all first with itertools

comb = set(itertools.combinations([banana,...], 12))
d = {}
for i in comb:
    d[i] = df[df.product.isin(i)].Client.nunique()

but this will take a spectacular amount of time.

do you guys see any better way to count this out?

please note that I do not want to find the combination of 12 most common products, but the combination that will yield the most clients (possibly the same but not necessarily, as 2 products not individually common may yield more clients if the 2 groups don't overlap much).

any thoughts?

thank you

答案1

得分: 1

这是最大覆盖问题,它是NP难题。该链接展示了一种贪婪算法,它提供了一个近似解决方案:在每一步中,添加具有最多新客户的产品。

英文:

This is the maximum coverage problem and it is NP-hard. That link shows a greedy algorithm which provides an approximate solution: at each step add the product with the most new clients.

huangapple
  • 本文由 发表于 2023年3月9日 16:59:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75682355.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定