没有多重集合的k-组合的通用解决方案?

huangapple go评论77阅读模式
英文:

No general solution to k-combination of multiset?

问题

以下是您要翻译的内容:


The problem to solve

解决可能包含重复项的 k-组合问题,并仅返回唯一的组合。

例如,输入为 [0,1,2,2,4],k = 2,结果为:

{(0, 1), (2, 4), (1, 2), (0, 4), (1, 4), (0, 2), (2, 2)}


The solution I can think of

我能想到的唯一通用解决方案是使用 setmap 来提取唯一结果。

例如,在Python中:

  1. from itertools import combinations
  2. print(set(combinations([0, 1, 2, 2, 4], 2)))

Questions

  • 是否有一种通用解决方案可以在不使用 set/map 进行过滤的情况下完成此操作。

我可以生成按顺序的唯一项组合(参考:https://stackoverflow.com/a/76048486),但如果存在重复项,则会有重复的组合。

我阅读了这个:https://math.stackexchange.com/a/554237,似乎它说没有一种通用解决方案可以获取所有唯一的组合,尽管有一种获得唯一组合数量的公式。

但是,我不确定。


@Update - Summary & tips

  • 根据答案,可以通过迭代和递归两种方式来完成此操作。
  • python翻译为go时:
    • 在Python中的yield需要在Go中使用chan,或者需要返回所有组合(对于大量输入可能很大)。
    • Python中的else用于forwhile循环,当使用break时在Go中会省略。
    • Python中的列表切片会创建一个副本(或引用?),而在Go中切片一个切片将重用相同的底层数组(因此在Go中,如果要复制的切片稍后将被修改,而且不希望在副本中出现该更改,则应将其复制/附加到新切片中)。
英文:

The problem to solve

Solve k-combination whose input may contain repeated items, and return only unique combinations.

e.g input is [0,1,2,2,4], k = 2, the result is:
> {(0, 1), (2, 4), (1, 2), (0, 4), (1, 4), (0, 2), (2, 2)}


The solution I can think of

The only general solution I can think is to use a set or map to extract unique results.

e.g with python
<!-- language: python -->

  1. from itertools import combinations
  2. print (set(combinations([0, 1, 2, 2, 4], 2)))

Questions

  • Is there a general solution to do this, without using a set/map for filtering.

I can generate combinations of unique items, in order (refer: https://stackoverflow.com/a/76048486), but if there are repeated items, then there will be repeated combinations.

I've read this: https://math.stackexchange.com/a/554237 , seems it says there is no general solution to get all unique combinations, though there do have a formula to get count of unique combinations.

But, I'm not sure.


@Update - Summary & tips

  • According to the answers, there are ways both via iteration & recursion to do this.
  • When translating from python to go:
    • yield in python, need chan in go, or u need return all combinations (which can be huge for large input).
    • python's else for for and while is omitted when break.
    • list slicing in python will make a copy (of reference ?), while slicing a go slice will reuse the same underlying array, (so in go, if the slice being copied will be modified later, and u don't want that change in the copy, then u should copy/append it into a new slice).

答案1

得分: 3

以下是提供的Python代码的中文翻译:

这是一个按词典顺序优化简单性而不是速度的非递归组合枚举器。Knuth的《计算机程序设计艺术》几乎肯定有一个速度优化的版本。

  1. def combinations(a, k):
  2. a = sorted(a)
  3. while True:
  4. yield a[:k]
  5. for i in range(k - 1, -1, -1):
  6. # a[i + 1 :] 已排序。对 a[i:] 进行排序。
  7. x = a[i]
  8. j = i + 1
  9. while j < len(a):
  10. if x < a[j]:
  11. break
  12. a[j - 1] = a[j]
  13. j += 1
  14. a[j - 1] = x
  15. # a[j] 是 x 之后的下一个最大元素。如果后面有足够多的元素,将它们旋转到适当位置。
  16. if j + (k - i) <= len(a):
  17. rotate(a, i, j, j + (k - i))
  18. break
  19. else:
  20. break
  21. def rotate(a, i, j, k):
  22. reverse(a, i, j)
  23. reverse(a, j, k)
  24. reverse(a, i, k)
  25. def reverse(a, i, k):
  26. for j in range((k - i) // 2):
  27. a[i + j], a[(k - 1) - j] = a[(k - 1) - j], a[i + j]
  28. print(*combinations([0, 1, 2, 2, 4], 2))
  29. print(*combinations([0, 1, 2, 2, 4], 3))
  30. print(*combinations([0, 0, 0, 1, 1, 1], 3))

输出:

  1. [0, 1] [0, 2] [0, 4] [1, 2] [1, 4] [2, 2] [2, 4]
  2. [0, 1, 2] [0, 1, 4] [0, 2, 2] [0, 2, 4] [1, 2, 2] [1, 2, 4] [2, 2, 4]
  3. [0, 0, 0] [0, 0, 1] [0, 1, 1] [1, 1, 1]
英文:

Here’s a non-recursive enumerator for combinations in lexicographic order, optimized for simplicity over speed. Knuth’s The Art of Computer Programming almost certainly has a speed-optimized version.

  1. def combinations(a, k):
  2. a = sorted(a)
  3. while True:
  4. yield a[:k]
  5. for i in range(k - 1, -1, -1):
  6. # a[i + 1 :] is sorted. Sort a[i:].
  7. x = a[i]
  8. j = i + 1
  9. while j &lt; len(a):
  10. if x &lt; a[j]:
  11. break
  12. a[j - 1] = a[j]
  13. j += 1
  14. a[j - 1] = x
  15. # a[j] is the next greatest element after x. If there are enough
  16. # elements after it, rotate them into position.
  17. if j + (k - i) &lt;= len(a):
  18. rotate(a, i, j, j + (k - i))
  19. break
  20. else:
  21. break
  22. def rotate(a, i, j, k):
  23. reverse(a, i, j)
  24. reverse(a, j, k)
  25. reverse(a, i, k)
  26. def reverse(a, i, k):
  27. for j in range((k - i) // 2):
  28. a[i + j], a[(k - 1) - j] = a[(k - 1) - j], a[i + j]
  29. print(*combinations([0, 1, 2, 2, 4], 2))
  30. print(*combinations([0, 1, 2, 2, 4], 3))
  31. print(*combinations([0, 0, 0, 1, 1, 1], 3))

Output:

  1. [0, 1] [0, 2] [0, 4] [1, 2] [1, 4] [2, 2] [2, 4]
  2. [0, 1, 2] [0, 1, 4] [0, 2, 2] [0, 2, 4] [1, 2, 2] [1, 2, 4] [2, 2, 4]
  3. [0, 0, 0] [0, 0, 1] [0, 1, 1] [1, 1, 1]

答案2

得分: 1

以下是使用Python生成这些内容的代码。

我正在使用迭代器。将其转化为另一种语言可能是翻译这个过程中最困难的部分。

  1. def unique_multisets(multiset, k):
  2. # 查找每个项目的重复次数。
  3. count = {}
  4. for x in multiset:
  5. count[x] = 1 + count.get(x, 0)
  6. # 将其转化为(值,重复次数)对。reverse是因为我们稍后会在生成多重集时将其反转。
  7. pairs = list(reversed(count.items()))
  8. # 计算累积总数。
  9. total = 0
  10. totals = []
  11. for p in pairs:
  12. total += p[1]
  13. totals.append(total)
  14. def recurse(partial, idx):
  15. if k == len(partial):
  16. yield partial
  17. elif idx < 0 or totals[idx] < k - len(partial):
  18. pass # 无法生成组合。
  19. else:
  20. (value, repeat) = pairs[idx]
  21. max_used = min(repeat, k - len(partial))
  22. for i in range(1 + max_used):
  23. partial.append(value)
  24. for i in range(1 + max_used):
  25. partial.pop()
  26. yield from recurse(partial, idx - 1)
  27. yield from recurse([], len(pairs) - 1)
  28. for x in unique_multisets([0, 1, 2, 2, 4], 2):
  29. print(x)

希望这可以帮助您理解这段代码。

英文:

Here is code that generates these one at a time in Python.

I'm using iterators. Translating away from that would be the hardest part of translating this to another language.

  1. def unique_multisets (multiset, k):
  2. # Find the repetition of each item.
  3. count = {}
  4. for x in multiset:
  5. count[x] = 1 + count.get(x, 0)
  6. # Turn this into (value, count) pairs. The reversed is
  7. # because we will reverse it later in generating the multisets.
  8. pairs = list(reversed(count.items()))
  9. # Calculate a running total.
  10. total = 0
  11. totals = []
  12. for p in pairs:
  13. total += p[1]
  14. totals.append(total)
  15. def recurse (partial, idx):
  16. if k == len(partial):
  17. yield partial
  18. elif idx &lt; 0 or totals[idx] &lt; k - len(partial):
  19. pass # Can&#39;t make combinations.
  20. else:
  21. (value, repeat) = pairs[idx]
  22. max_used = min(repeat, k - len(partial))
  23. for i in range(1 + max_used):
  24. partial.append(value)
  25. for i in range(1 + max_used):
  26. partial.pop()
  27. yield from recurse(partial, idx - 1)
  28. yield from recurse([], len(pairs)-1)
  29. for x in unique_multisets([0, 1, 2, 2, 4], 2):
  30. print(x)

答案3

得分: 1

以下是我为这个任务开发的一个算法:

  1. def multiset_combinations(v, m):
  2. v = sorted(v)
  3. f = [0]
  4. j = 0
  5. # 填充f,一个扩展索引的列表。例如,
  6. #
  7. # v = [2, 2, 33, 33, 33, 40]
  8. #
  9. # f将是:
  10. #
  11. # [0, 0, 1, 1, 1, 2]
  12. for i in range(1, len(v)):
  13. if v[i] != v[i - 1]:
  14. j += 1
  15. f.append(j)
  16. v = list(set(v))
  17. r = [0] * m
  18. n = len(v)
  19. z = f[:m]
  20. idx = [0] * n
  21. p = len(f) - m
  22. last = f[-m:]
  23. last[-1] += 1
  24. # 找到每个索引的第一个出现。例如,对于上面的示例,idx将是:
  25. #
  26. # [0, 2, 5]
  27. for i in range(n):
  28. idx[i] = f.index(i)
  29. # 主要算法
  30. while True:
  31. while z[-1] < n:
  32. for j in range(m):
  33. r[j] = v[z[j]]
  34. z[-1] += 1
  35. yield r.copy()
  36. if z == last:
  37. break
  38. for i in range(m - 2, -1, -1):
  39. if z[i] != f

    :

  40. z[i] += 1

  41. for j, k in zip(range(i + 1, m),

  42. range(idx[z[i]] + 1, idx[z[i]] + m)):

  43. z[j] = f[k]

  44. break

调用它:

  1. print(*multiset_combination([0, 1, 2, 2, 4], 2))
  2. #> [0, 1] [0, 2] [0, 4] [1, 2] [1, 4] [2, 2] [2, 4]
  3. print(*multiset_combination([0, 1, 2, 2, 4], 3))
  4. #> [0, 1, 2] [0, 1, 4] [0, 2, 2] [0, 2, 4] [1, 2, 2] [1, 2, 4] [2, 2, 4]
  5. print(*multiset_combination([0, 0, 0, 1, 1, 1], 3))
  6. #> [0, 0, 0] [0, 0, 1] [0, 1, 1] [1, 1, 1]

它也相当高效。它可以在不到一秒的时间内生成753564

  1. v1 = [1,
  2. 2, 2,
  3. 3, 3, 3,
  4. 4, 4, 4, 4,
  5. 5, 5, 5, 5, 5,
  6. 6,
  7. 7, 7,
  8. 8, 8, 8,
  9. 9, 9, 9, 9,
  10. 10, 10, 10, 10, 10,
  11. 11,
  12. 12, 12,
  13. 13, 13, 13,
  14. 14, 14, 14, 14,
  15. 15, 15, 15, 15, 15]
  16. def get_time(v, m):
  17. t1 = time.time()
  18. list(multiset_combinations(v, m))
  19. t2 = time.time()
  20. return t2 - t1
  21. get_time(v1, 10)
  22. #> 0.8702290058135986
英文:

Below is an algorithm I developed some time back for this task:

  1. def multiset_combinations(v, m):
  2. v = sorted(v)
  3. f = [0]
  4. j = 0
  5. # Populating f, a list of expanded indices. E.g. For
  6. #
  7. # v = [2, 2, 33, 33, 33, 40]
  8. #
  9. # f would be:
  10. #
  11. # [0, 0, 1, 1, 1, 2]
  12. for i in range(1, len(v)):
  13. if v[i] != v[i - 1]:
  14. j += 1
  15. f.append(j)
  16. v = list(set(v))
  17. r = [0] * m
  18. n = len(v)
  19. z = f[:m]
  20. idx = [0] * n
  21. p = len(f) - m
  22. last = f[-m:]
  23. last[-1] += 1
  24. # Find the first occurence of each index. E.g. For the
  25. # same example above, idx would be:
  26. #
  27. # [0, 2, 5]
  28. for i in range(n):
  29. idx[i] = f.index(i)
  30. # The main algorithm
  31. while True:
  32. while z[-1] &lt; n:
  33. for j in range(m):
  34. r[j] = v[z[j]]
  35. z[-1] += 1
  36. yield r.copy()
  37. if z == last:
  38. break
  39. for i in range(m - 2, -1, -1):
  40. if z[i] != f

    :

  41. z[i] += 1

  42. for j, k in zip(range(i + 1, m),

  43. range(idx[z[i]] + 1, idx[z[i]] + m)):

  44. z[j] = f[k]

  45. break

Calling it:

  1. print(*multiset_combination([0, 1, 2, 2, 4], 2))
  2. #&gt; [0, 1] [0, 2] [0, 4] [1, 2] [1, 4] [2, 2] [2, 4]
  3. print(*multiset_combination([0, 1, 2, 2, 4], 3))
  4. #&gt; [0, 1, 2] [0, 1, 4] [0, 2, 2] [0, 2, 4] [1, 2, 2] [1, 2, 4] [2, 2, 4]
  5. print(*multiset_combination([0, 0, 0, 1, 1, 1], 3))
  6. #&gt; [0, 0, 0] [0, 0, 1] [0, 1, 1] [1, 1, 1]

It is pretty efficient as well. It can generate 753564 in just under a second:

  1. v1 = [1,
  2. 2, 2,
  3. 3, 3, 3,
  4. 4, 4, 4, 4,
  5. 5, 5, 5, 5, 5,
  6. 6,
  7. 7, 7,
  8. 8, 8, 8,
  9. 9, 9, 9, 9,
  10. 10, 10, 10, 10, 10,
  11. 11,
  12. 12, 12,
  13. 13, 13, 13,
  14. 14, 14, 14, 14,
  15. 15, 15, 15, 15, 15]
  16. def get_time(v, m):
  17. t1 = time.time()
  18. list(multiset_combinations(v, m))
  19. t2 = time.time()
  20. return t2 - t1
  21. get_time(v1, 10)
  22. #&gt; 0.8702290058135986

huangapple
  • 本文由 发表于 2023年4月20日 05:17:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76058880.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定