2023年5月21日 06:58:37go评论94阅读模式

英文:

Efficiently convert list of probabilities in a list of 0/1 by taking a % of highest probabilities without reindexing

问题

Sure, here are the translated code parts you requested:

# Problem
给定一个大型的概率数组和要提取的百分比
```python3
probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
N_percentages = [20, 30, 40]  # 列表大小的百分比

我想要高效地计算

{20:[0, 0, 0, 0, 1, 0, 0], 30:[0, 0, 1, 0, 1, 0, 0], 40:[0, 0, 1, 0, 1, 0, 1]}

我不能失去索引 - 值必须保持原来的位置

我的尝试到目前为止：

解决方案1

def mark_probabilities_for_multiple_N1(probabilities, N_percentages):
    marked_lists = {}
    sorted_indices = sorted(range(len(probabilities)), key=lambda i: probabilities[i], reverse=True)
    list_size = len(probabilities)
    
    for N_percentage in N_percentages:
        N = int(N_percentage * list_size / 100)
        
        marked_lists[N_percentage] = [1 if i in sorted_indices[:N] else 0 for i in range(len(probabilities))]
        # 利用先前计算的较小N值的标记列表
        for prev_N_percentage in [prev_N for prev_N in marked_lists if prev_N < N_percentage]:
            marked_lists[N_percentage] = [1 if marked_lists[prev_N_percentage][i] == 1 or marked_lists[N_percentage][i] == 1 else 0 for i in range(len(probabilities))]
    return marked_lists

解决方案2 - 使用heapq

将（索引，概率值）映射到heapq，按probability_value排序

def indicies_n_largest(values_with_indicies, percentage) -> set[int]:  # O(1) exists(int)
    """
    返回数组中n个最大概率的索引列表。
    :param arr: 概率数组
    :param percentage: 要返回的最大概率的百分比
    返回：最大概率的索引列表
    """
    fraction = percentage / 100
    samples_num = int(len(values_with_indicies) * fraction)
    result = heapq.nlargest(samples_num, values_with_indicies, key=lambda x: x[1])
    return [x[0] for x in result]
def percentage_indicies_map(action_probs, percentages) -> dict[int, set[int]]:
    """
    给定动作概率和百分比列表，返回一个动作索引的映射，这些动作被视为优秀，对于每个百分比。
    """
    values_wth_indicies = [(i, x) for i, x in enumerate(action_probs)]
    percentage_indicies_map: dict[
        int, set[int]
    ] = {}  # 最大概率的索引列表
    for percentage in percentages:
        percentage_indicies_map[percentage] = indicies_n_largest(values_wth_indicies, percentage)
    return percentage_indicies_map

请注意，我只提供了代码的翻译部分，没有包括问题或其他内容。如果您需要进一步的说明或帮助，请告诉我。

英文:

Problem

Given a huge array of probabilities and the percentages to take

probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
N_percentages = [20, 30, 40]  # Percentage of the list size

I want to efficiently compute

{20:[0, 0, 0, 0, 1, 0, 0], 30:[0, 0, 1, 0, 1, 0, 0], 40:[0, 0, 1, 0, 1, 0, 1]}

I cannot lose the indexing - values have to keep their original place

My tries so far:

Solution number 1

def mark_probabilities_for_multiple_N1(probabilities, N_percentages):
    marked_lists = {}
    sorted_indices = sorted(range(len(probabilities)), key=lambda i: probabilities[i], reverse=True)
    list_size = len(probabilities)
    
    for N_percentage in N_percentages:
        N = int(N_percentage * list_size / 100)
        
        marked_lists[N_percentage] = [1 if i in sorted_indices[:N] else 0 for i in range(len(probabilities))]
        
        # Utilize previously calculated marked lists for smaller N values
        for prev_N_percentage in [prev_N for prev_N in marked_lists if prev_N &lt; N_percentage]:
            marked_lists[N_percentage] = [1 if marked_lists[prev_N_percentage][i] == 1 or marked_lists[N_percentage][i] == 1 else 0 for i in range(len(probabilities))]
    return marked_lists

Solution number 2 - use heapq

Map the (idx, probability_value) to a heapq, order by the probability_value

def indicies_n_largest(values_with_indicies, percentage) -&gt; set[int]:  # O(1) exists(int)
    &quot;&quot;&quot;
    Returns a list of indicies for n largest probabilities in the array.
    :param arr: array of probabilities
    :param percentage: percentage of the largest probabilities to be returned
    returns: list of indicies of the largest probabilities
    &quot;&quot;&quot;
    fraction = percentage / 100
    samples_num = int(len(values_with_indicies) * fraction)
    result = heapq.nlargest(samples_num, values_with_indicies, key=lambda x: x[1])
    return [x[0] for x in result]
def percentage_indicies_map(action_probs, percentages) -&gt; dict[int, set[int]]:
    &quot;&quot;&quot;
    Given action probabilities and a list of percentages, return a map of actions&#39; indicies that are considered good,
    for each percentage.
    &quot;&quot;&quot;
    values_wth_indicies = [(i, x) for i, x in enumerate(action_probs)]
    percentage_indicies_map: dict[
        int, set[int]
    ] = {}  # list of indicies of the largest probabilities
    for percentage in percentages:
        percentage_indicies_map[percentage] = indicies_n_largest(values_wth_indicies, percentage)
    return percentage_indicies_map

答案1

得分: 1

以下是您要翻译的代码部分：

probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
N_percentages = [20, 30, 40]
out, s = {}, sorted(enumerate(probabilities), key=lambda k: -k[1])
for p in N_percentages:
	ones = set(i for i, _ in s[:round((p / 100) * len(probabilities))])
	out = [int(i in ones) for i in range(len(probabilities))]
print(out)

打印输出：

{
  20: [0, 0, 0, 0, 1, 0, 0], 
  30: [0, 0, 1, 0, 1, 0, 0], 
  40: [0, 0, 1, 0, 1, 0, 1]
}

英文:

You can try:

probabilities = [0.1, 0.4, 0.7, 0.2, 0.9, 0.5, 0.6]
N_percentages = [20, 30, 40]
out, s = {}, sorted(enumerate(probabilities), key=lambda k: -k[1])
for p in N_percentages:
	ones = set(i for i, _ in s[:round((p / 100) * len(probabilities))])
	out = [int(i in ones) for i in range(len(probabilities))]
print(out)

Prints:

{
  20: [0, 0, 0, 0, 1, 0, 0], 
  30: [0, 0, 1, 0, 1, 0, 0], 
  40: [0, 0, 1, 0, 1, 0, 1]
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

高效地将概率列表转换为0/1列表，只需取最高概率的一部分%而无需重新索引

问题

解决方案1

解决方案2 - 使用heapq

Problem

Solution number 1

Solution number 2 - use heapq

答案1

在pandas计算中出现错误。

Why does insertion sort work only if we use while as an inner loop and doesn’t work for ” for loop”?

如何将类导入其他文件？

清除字符串中的反斜杠并将结果放入数据结构中

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。