根据最高相似度对字典列表进行值排序

huangapple go评论110阅读模式
英文:

Sort list of dictionaries based on values with highest similarity

问题

给定以下的Python字典列表:

results = [[{'id': '001', 'result': [0,0,0,0,1]},
           {'id': '002', 'result': [1,1,1,1,1]},
           {'id': '003', 'result': [0,1,1,None,None]},
           {'id': '004', 'result': [0,None,None,1,0]},
           {'id': '005', 'result': [1,0,None,1,1]},
           {'id': '006', 'result': [0,0,0,1,1]}],
          [{'id': '001', 'result': [1,0,1,0,1]},
           {'id': '002', 'result': [1,1,1,1,1]},
           {'id': '003', 'result': [0,1,1,None,None]},
           {'id': '004', 'result': [0,None,None,1,0]},
           {'id': '005', 'result': [1,0,None,1,1]},
           {'id': '006', 'result': [1,0,1,0,1]}]
            ]

我想根据'result'的值生成一个新的排序列表(使用Python和Golang),通过比较每个组中的玩家('id')之间的结果,然后根据匹配条目的数量进行排序(None结果被丢弃并且不计数):

在第一轮和第二轮中,001和006有九个匹配的答案:
001 = [0,0,0,0,1] 006 = [0,0,0,1,1] - 四个匹配的答案。
在第二轮中,001和006有五个匹配的答案:
001 = [1,0,1,0,1] 006 = [1,0,1,0,1] - 五个匹配的答案

sorted_results = ['001','006','002','005','003','004']

'001'和'006'是列表中的前两个项目,因为它们具有最高数量的匹配结果 - 九个。

英文:

Given the following python list of dictionaries:

results = [[{'id': '001', 'result': [0,0,0,0,1]},
           {'id': '002', 'result': [1,1,1,1,1]},
           {'id': '003', 'result': [0,1,1,None,None]},
           {'id': '004', 'result': [0,None,None,1,0]},
           {'id': '005', 'result': [1,0,None,1,1]},
           {'id': '006', 'result': [0,0,0,1,1]}],
          [{'id': '001', 'result': [1,0,1,0,1]},
           {'id': '002', 'result': [1,1,1,1,1]},
           {'id': '003', 'result': [0,1,1,None,None]},
           {'id': '004', 'result': [0,None,None,1,0]},
           {'id': '005', 'result': [1,0,None,1,1]},
           {'id': '006', 'result': [1,0,1,0,1]}]
            ]

I would like to generate a new sorted list (in both python and golang) based on the values of 'result' by comparing results between the players ('id') in each group and then sorting them based on the number of matching entries (None results are discarded and not counted): <br>

During the first round and second round 001 and 006 had nine matching answers:<br>
001 = [0,0,0,0,1] 006 = [0,0,0,1,1] - four matching answers.<br>
During the second round, 001 and 006 had five matching answers:<br>
001 = [1,0,1,0,1] 006 = [1,0,1,0,1] - five matching answers

sorted_results = [&#39;001&#39;,&#39;006&#39;,&#39;002&#39;,&#39;005&#39;,&#39;003&#39;,&#39;004&#39;]

'001' and '006' are the first two items in the list because they have the highest number of matching results - nine.

答案1

得分: 1

如果按照“相同结果最多的数量”对这些项目进行排序,得到的结果如下:

['003', '004', '005', '006', '001', '002']

如果你的意思不是“相同结果最多的数量”,请澄清你的问题。另外,你可以简单修改max_identical函数,使其根据你对相似的定义进行操作。

上述结果是通过以下代码计算得出的:

from collections import defaultdict
 
 
results = [{'id': '001', 'result': [0, 0, 0, 0, 1]},
           {'id': '002', 'result': [1, 1, 1, 1, 1]},
           {'id': '003', 'result': [0, 1, 1, None, None]},
           {'id': '004', 'result': [0, None, None, 1, 0]},
           {'id': '005', 'result': [1, 0, None, 1, 1]},
           {'id': '006', 'result': [0, 0, 0, 1, 1]}]
 
 
def max_identical(lst):
    counts = defaultdict(lambda: 0)
    for x in lst:
        if x is not None:
            counts[x] += 1
    return max(counts.values())
 
 
results = sorted(results, key=lambda x: max_identical(x['result']))
 
print [x['id'] for x in results]
英文:

If you sort those items by the "highest number of identical results", this is what you get:

[&#39;003&#39;, &#39;004&#39;, &#39;005&#39;, &#39;006&#39;, &#39;001&#39;, &#39;002&#39;]

If you meant something else (i.e. not "highest number of identical results"), please clarify your question. Also, you can simply modify the max_identical function so that it acts according to your definition of similar.

The above result was computed with:

from collections import defaultdict
 
 
results = [{&#39;id&#39;: &#39;001&#39;, &#39;result&#39;: [0, 0, 0, 0, 1]},
           {&#39;id&#39;: &#39;002&#39;, &#39;result&#39;: [1, 1, 1, 1, 1]},
           {&#39;id&#39;: &#39;003&#39;, &#39;result&#39;: [0, 1, 1, None, None]},
           {&#39;id&#39;: &#39;004&#39;, &#39;result&#39;: [0, None, None, 1, 0]},
           {&#39;id&#39;: &#39;005&#39;, &#39;result&#39;: [1, 0, None, 1, 1]},
           {&#39;id&#39;: &#39;006&#39;, &#39;result&#39;: [0, 0, 0, 1, 1]}]
 
 
def max_identical(lst):
    counts = defaultdict(lambda: 0)
    for x in lst:
        if x is not None:
            counts[x] += 1
    return max(counts.values())
 
 
results = sorted(results, key=lambda x: max_identical(x[&#39;result&#39;]))
 
print [x[&#39;id&#39;] for x in results]

答案2

得分: 0

在寻找与您的问题非常相似的解决方案时,我找到了这个页面:http://w3facility.org/question/sorting-a-python-dictionary-after-running-an-itertools-function/

使用您的示例代码:

import itertools
results = [[{'id': '001', 'result': [0,0,0,0,1]},
           {'id': '002', 'result': [1,1,1,1,1]},
           {'id': '003', 'result': [0,1,1,None,None]},
           {'id': '004', 'result': [0,None,None,1,0]},
           {'id': '005', 'result': [1,0,None,1,1]},
           {'id': '006', 'result': [0,0,0,1,1]}],
          [{'id': '001', 'result': [1,0,1,0,1]},
           {'id': '002', 'result': [1,1,1,1,1]},
           {'id': '003', 'result': [0,1,1,None,None]},
           {'id': '004', 'result': [0,None,None,1,0]},
           {'id': '005', 'result': [1,0,None,1,1]},
           {'id': '006', 'result': [1,0,1,0,1]}]
          ]

这将创建一个所有id之间的全对比每一轮都会进行

similarity = {}
for p1, p2 in itertools.combinations(results[0], 2):
    similarity.setdefault((p1["id"], p2["id"]), sum([1 for i in range(len(p1["result"])) if p1["result"][i] == p2["result"][i]]))
for p1, p2 in itertools.combinations(results[1], 2):
    similarity.setdefault((p1["id"], p2["id"]), 0)
    similarity[(p1["id"], p2["id"])] += sum([1 for i in range(len(p1["result"])) if p1["result"][i] == p2["result"][i]])

现在要按匹配值对id对进行排序将返回一个有序的id元组列表

similarity = sorted(similarity, key=lambda x:similarity[x], reverse=True)
print(similarity)

现在要消除重复的值只需要保留每个id的第一次出现按照顺序忽略其他的

sorted_ids = []
for tuple_id in similarity:
    if tuple_id[0] not in sorted_ids:
        sorted_ids.append(tuple_id[0])
    if tuple_id[1] not in sorted_ids:
        sorted_ids.append(tuple_id[1])

print(sorted_ids)
英文:

Looking for a solution for a problem very similar to yours I found this page:
http://w3facility.org/question/sorting-a-python-dictionary-after-running-an-itertools-function/

Using your example:

import itertools
results = [[{&#39;id&#39;: &#39;001&#39;, &#39;result&#39;: [0,0,0,0,1]},
{&#39;id&#39;: &#39;002&#39;, &#39;result&#39;: [1,1,1,1,1]},
{&#39;id&#39;: &#39;003&#39;, &#39;result&#39;: [0,1,1,None,None]},
{&#39;id&#39;: &#39;004&#39;, &#39;result&#39;: [0,None,None,1,0]},
{&#39;id&#39;: &#39;005&#39;, &#39;result&#39;: [1,0,None,1,1]},
{&#39;id&#39;: &#39;006&#39;, &#39;result&#39;: [0,0,0,1,1]}],
[{&#39;id&#39;: &#39;001&#39;, &#39;result&#39;: [1,0,1,0,1]},
{&#39;id&#39;: &#39;002&#39;, &#39;result&#39;: [1,1,1,1,1]},
{&#39;id&#39;: &#39;003&#39;, &#39;result&#39;: [0,1,1,None,None]},
{&#39;id&#39;: &#39;004&#39;, &#39;result&#39;: [0,None,None,1,0]},
{&#39;id&#39;: &#39;005&#39;, &#39;result&#39;: [1,0,None,1,1]},
{&#39;id&#39;: &#39;006&#39;, &#39;result&#39;: [1,0,1,0,1]}]
]

This will create an all vs all comparison of the ids, each for for each round.

similarity = {}
for p1, p2 in itertools.combinations(results[0], 2):
similarity.setdefault((p1[&quot;id&quot;], p2[&quot;id&quot;]), sum([1 for i in range(len(p1[&quot;result&quot;])) if p1[&quot;result&quot;][i] == p2[&quot;result&quot;][i]]))
for p1, p2 in itertools.combinations(results[1], 2):
similarity.setdefault((p1[&quot;id&quot;], p2[&quot;id&quot;]), 0)
similarity[(p1[&quot;id&quot;], p2[&quot;id&quot;])] += sum([1 for i in range(len(p1[&quot;result&quot;])) if p1[&quot;result&quot;][i] == p2[&quot;result&quot;][i]])

Now to sort the ids pairs by their matching values, will return a list of ordered tuples of ids.

similarity = sorted(similarity, key=lambda x:similarity[x], reverse=True)
print(similarity)

Now to eliminate the duplicate values, it is just necessary to retain the first occurence of each id, in that order and forget of the rest.

sorted_ids = []
for tuple_id in similarity:
if tuple_id[0] not in sorted_ids:
sorted_ids.append(tuple_id[0])
if tuple_id[1] not in sorted_ids:
sorted_ids.append(tuple_id[1])
print sorted_ids

huangapple
  • 本文由 发表于 2013年10月1日 22:33:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/19118968.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定