高效地将概率列表转换为0/1列表,只需取最高概率的一部分%而无需重新索引

huangapple go评论90阅读模式
英文:

Efficiently convert list of probabilities in a list of 0/1 by taking a % of highest probabilities without reindexing

问题

Sure, here are the translated code parts you requested:

  1. # Problem
  2. 给定一个大型的概率数组和要提取的百分比
  3. ```python3
  4. probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
  5. N_percentages = [20, 30, 40] # 列表大小的百分比

我想要高效地计算

  1. {20:[0, 0, 0, 0, 1, 0, 0], 30:[0, 0, 1, 0, 1, 0, 0], 40:[0, 0, 1, 0, 1, 0, 1]}

我不能失去索引 - 值必须保持原来的位置

我的尝试到目前为止:

解决方案1

  1. def mark_probabilities_for_multiple_N1(probabilities, N_percentages):
  2. marked_lists = {}
  3. sorted_indices = sorted(range(len(probabilities)), key=lambda i: probabilities[i], reverse=True)
  4. list_size = len(probabilities)
  5. for N_percentage in N_percentages:
  6. N = int(N_percentage * list_size / 100)
  7. marked_lists[N_percentage] = [1 if i in sorted_indices[:N] else 0 for i in range(len(probabilities))]
  8. # 利用先前计算的较小N值的标记列表
  9. for prev_N_percentage in [prev_N for prev_N in marked_lists if prev_N < N_percentage]:
  10. marked_lists[N_percentage] = [1 if marked_lists[prev_N_percentage][i] == 1 or marked_lists[N_percentage][i] == 1 else 0 for i in range(len(probabilities))]
  11. return marked_lists

解决方案2 - 使用heapq

将(索引,概率值)映射到heapq,按probability_value排序

  1. def indicies_n_largest(values_with_indicies, percentage) -> set[int]: # O(1) exists(int)
  2. """
  3. 返回数组中n个最大概率的索引列表。
  4. :param arr: 概率数组
  5. :param percentage: 要返回的最大概率的百分比
  6. 返回:最大概率的索引列表
  7. """
  8. fraction = percentage / 100
  9. samples_num = int(len(values_with_indicies) * fraction)
  10. result = heapq.nlargest(samples_num, values_with_indicies, key=lambda x: x[1])
  11. return [x[0] for x in result]
  12. def percentage_indicies_map(action_probs, percentages) -> dict[int, set[int]]:
  13. """
  14. 给定动作概率和百分比列表,返回一个动作索引的映射,这些动作被视为优秀,对于每个百分比。
  15. """
  16. values_wth_indicies = [(i, x) for i, x in enumerate(action_probs)]
  17. percentage_indicies_map: dict[
  18. int, set[int]
  19. ] = {} # 最大概率的索引列表
  20. for percentage in percentages:
  21. percentage_indicies_map[percentage] = indicies_n_largest(values_wth_indicies, percentage)
  22. return percentage_indicies_map

请注意,我只提供了代码的翻译部分,没有包括问题或其他内容。如果您需要进一步的说明或帮助,请告诉我。

英文:

Problem

Given a huge array of probabilities and the percentages to take

  1. probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
  2. N_percentages = [20, 30, 40] # Percentage of the list size

I want to efficiently compute

  1. {20:[0, 0, 0, 0, 1, 0, 0], 30:[0, 0, 1, 0, 1, 0, 0], 40:[0, 0, 1, 0, 1, 0, 1]}

I cannot lose the indexing - values have to keep their original place

My tries so far:

Solution number 1

  1. def mark_probabilities_for_multiple_N1(probabilities, N_percentages):
  2. marked_lists = {}
  3. sorted_indices = sorted(range(len(probabilities)), key=lambda i: probabilities[i], reverse=True)
  4. list_size = len(probabilities)
  5. for N_percentage in N_percentages:
  6. N = int(N_percentage * list_size / 100)
  7. marked_lists[N_percentage] = [1 if i in sorted_indices[:N] else 0 for i in range(len(probabilities))]
  8. # Utilize previously calculated marked lists for smaller N values
  9. for prev_N_percentage in [prev_N for prev_N in marked_lists if prev_N &lt; N_percentage]:
  10. marked_lists[N_percentage] = [1 if marked_lists[prev_N_percentage][i] == 1 or marked_lists[N_percentage][i] == 1 else 0 for i in range(len(probabilities))]
  11. return marked_lists

Solution number 2 - use heapq

Map the (idx, probability_value) to a heapq, order by the probability_value

  1. def indicies_n_largest(values_with_indicies, percentage) -&gt; set[int]: # O(1) exists(int)
  2. &quot;&quot;&quot;
  3. Returns a list of indicies for n largest probabilities in the array.
  4. :param arr: array of probabilities
  5. :param percentage: percentage of the largest probabilities to be returned
  6. returns: list of indicies of the largest probabilities
  7. &quot;&quot;&quot;
  8. fraction = percentage / 100
  9. samples_num = int(len(values_with_indicies) * fraction)
  10. result = heapq.nlargest(samples_num, values_with_indicies, key=lambda x: x[1])
  11. return [x[0] for x in result]
  12. def percentage_indicies_map(action_probs, percentages) -&gt; dict[int, set[int]]:
  13. &quot;&quot;&quot;
  14. Given action probabilities and a list of percentages, return a map of actions&#39; indicies that are considered good,
  15. for each percentage.
  16. &quot;&quot;&quot;
  17. values_wth_indicies = [(i, x) for i, x in enumerate(action_probs)]
  18. percentage_indicies_map: dict[
  19. int, set[int]
  20. ] = {} # list of indicies of the largest probabilities
  21. for percentage in percentages:
  22. percentage_indicies_map[percentage] = indicies_n_largest(values_wth_indicies, percentage)
  23. return percentage_indicies_map

答案1

得分: 1

以下是您要翻译的代码部分:

  1. probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
  2. N_percentages = [20, 30, 40]
  3. out, s = {}, sorted(enumerate(probabilities), key=lambda k: -k[1])
  4. for p in N_percentages:
  5. ones = set(i for i, _ in s[:round((p / 100) * len(probabilities))])
  6. out

    = [int(i in ones) for i in range(len(probabilities))]

  7. print(out)

打印输出:

  1. {
  2. 20: [0, 0, 0, 0, 1, 0, 0],
  3. 30: [0, 0, 1, 0, 1, 0, 0],
  4. 40: [0, 0, 1, 0, 1, 0, 1]
  5. }
英文:

You can try:

  1. probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
  2. N_percentages = [20, 30, 40]
  3. out, s = {}, sorted(enumerate(probabilities), key=lambda k: -k[1])
  4. for p in N_percentages:
  5. ones = set(i for i, _ in s[:round((p / 100) * len(probabilities))])
  6. out

    = [int(i in ones) for i in range(len(probabilities))]

  7. print(out)

Prints:

  1. {
  2. 20: [0, 0, 0, 0, 1, 0, 0],
  3. 30: [0, 0, 1, 0, 1, 0, 0],
  4. 40: [0, 0, 1, 0, 1, 0, 1]
  5. }

huangapple
  • 本文由 发表于 2023年5月21日 06:58:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76297638.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定