英文:
Efficiently convert list of probabilities in a list of 0/1 by taking a % of highest probabilities without reindexing
问题
Sure, here are the translated code parts you requested:
# Problem
给定一个大型的概率数组和要提取的百分比
```python3
probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
N_percentages = [20, 30, 40] # 列表大小的百分比
我想要高效地计算
{20:[0, 0, 0, 0, 1, 0, 0], 30:[0, 0, 1, 0, 1, 0, 0], 40:[0, 0, 1, 0, 1, 0, 1]}
我不能失去索引 - 值必须保持原来的位置
我的尝试到目前为止:
解决方案1
def mark_probabilities_for_multiple_N1(probabilities, N_percentages):
marked_lists = {}
sorted_indices = sorted(range(len(probabilities)), key=lambda i: probabilities[i], reverse=True)
list_size = len(probabilities)
for N_percentage in N_percentages:
N = int(N_percentage * list_size / 100)
marked_lists[N_percentage] = [1 if i in sorted_indices[:N] else 0 for i in range(len(probabilities))]
# 利用先前计算的较小N值的标记列表
for prev_N_percentage in [prev_N for prev_N in marked_lists if prev_N < N_percentage]:
marked_lists[N_percentage] = [1 if marked_lists[prev_N_percentage][i] == 1 or marked_lists[N_percentage][i] == 1 else 0 for i in range(len(probabilities))]
return marked_lists
解决方案2 - 使用heapq
将(索引,概率值)映射到heapq,按probability_value
排序
def indicies_n_largest(values_with_indicies, percentage) -> set[int]: # O(1) exists(int)
"""
返回数组中n个最大概率的索引列表。
:param arr: 概率数组
:param percentage: 要返回的最大概率的百分比
返回:最大概率的索引列表
"""
fraction = percentage / 100
samples_num = int(len(values_with_indicies) * fraction)
result = heapq.nlargest(samples_num, values_with_indicies, key=lambda x: x[1])
return [x[0] for x in result]
def percentage_indicies_map(action_probs, percentages) -> dict[int, set[int]]:
"""
给定动作概率和百分比列表,返回一个动作索引的映射,这些动作被视为优秀,对于每个百分比。
"""
values_wth_indicies = [(i, x) for i, x in enumerate(action_probs)]
percentage_indicies_map: dict[
int, set[int]
] = {} # 最大概率的索引列表
for percentage in percentages:
percentage_indicies_map[percentage] = indicies_n_largest(values_wth_indicies, percentage)
return percentage_indicies_map
请注意,我只提供了代码的翻译部分,没有包括问题或其他内容。如果您需要进一步的说明或帮助,请告诉我。
英文:
Problem
Given a huge array of probabilities and the percentages to take
probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
N_percentages = [20, 30, 40] # Percentage of the list size
I want to efficiently compute
{20:[0, 0, 0, 0, 1, 0, 0], 30:[0, 0, 1, 0, 1, 0, 0], 40:[0, 0, 1, 0, 1, 0, 1]}
I cannot lose the indexing - values have to keep their original place
My tries so far:
Solution number 1
def mark_probabilities_for_multiple_N1(probabilities, N_percentages):
marked_lists = {}
sorted_indices = sorted(range(len(probabilities)), key=lambda i: probabilities[i], reverse=True)
list_size = len(probabilities)
for N_percentage in N_percentages:
N = int(N_percentage * list_size / 100)
marked_lists[N_percentage] = [1 if i in sorted_indices[:N] else 0 for i in range(len(probabilities))]
# Utilize previously calculated marked lists for smaller N values
for prev_N_percentage in [prev_N for prev_N in marked_lists if prev_N < N_percentage]:
marked_lists[N_percentage] = [1 if marked_lists[prev_N_percentage][i] == 1 or marked_lists[N_percentage][i] == 1 else 0 for i in range(len(probabilities))]
return marked_lists
Solution number 2 - use heapq
Map the (idx, probability_value) to a heapq, order by the probability_value
def indicies_n_largest(values_with_indicies, percentage) -> set[int]: # O(1) exists(int)
"""
Returns a list of indicies for n largest probabilities in the array.
:param arr: array of probabilities
:param percentage: percentage of the largest probabilities to be returned
returns: list of indicies of the largest probabilities
"""
fraction = percentage / 100
samples_num = int(len(values_with_indicies) * fraction)
result = heapq.nlargest(samples_num, values_with_indicies, key=lambda x: x[1])
return [x[0] for x in result]
def percentage_indicies_map(action_probs, percentages) -> dict[int, set[int]]:
"""
Given action probabilities and a list of percentages, return a map of actions' indicies that are considered good,
for each percentage.
"""
values_wth_indicies = [(i, x) for i, x in enumerate(action_probs)]
percentage_indicies_map: dict[
int, set[int]
] = {} # list of indicies of the largest probabilities
for percentage in percentages:
percentage_indicies_map[percentage] = indicies_n_largest(values_wth_indicies, percentage)
return percentage_indicies_map
答案1
得分: 1
以下是您要翻译的代码部分:
probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
N_percentages = [20, 30, 40]
out, s = {}, sorted(enumerate(probabilities), key=lambda k: -k[1])
for p in N_percentages:
ones = set(i for i, _ in s[:round((p / 100) * len(probabilities))])
out = [int(i in ones) for i in range(len(probabilities))]
print(out)
打印输出:
{
20: [0, 0, 0, 0, 1, 0, 0],
30: [0, 0, 1, 0, 1, 0, 0],
40: [0, 0, 1, 0, 1, 0, 1]
}
英文:
You can try:
probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
N_percentages = [20, 30, 40]
out, s = {}, sorted(enumerate(probabilities), key=lambda k: -k[1])
for p in N_percentages:
ones = set(i for i, _ in s[:round((p / 100) * len(probabilities))])
out = [int(i in ones) for i in range(len(probabilities))]
print(out)
Prints:
{
20: [0, 0, 0, 0, 1, 0, 0],
30: [0, 0, 1, 0, 1, 0, 0],
40: [0, 0, 1, 0, 1, 0, 1]
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论