英文:
Remove redundant tuples from dictionary based on the score
问题
你可以使用以下代码来从字典中移除得分较低的元组,而不必为每个键循环遍历所有值。这个方法使用了Python的列表推导和字典推导:
# 原始字典
a = {
'trans': [('pickup', 1.0), ('boat', 1.0), ('plane', 1.0), ('walking', 1.0), ('foot', 1.0), ('train', 0.7455259731472191), ('trailer', 0.7227749512667475), ('car', 0.7759192750865143)],
'actor': {
'autori': [('smug', 1.0), ('pol', 1.0), ('traff', 1.0), ('local authori', 0.6894454471465952), ('driv', 0.6121365092485745), ('car', 0.6297345748705596)],
'fam': [('fa', 1.0), ('mo', 1.0), ('bro', 1.0), ('son', 0.9925431812951816), ('sis', 0.9789254869156859), ('fami', 0.8392597243422916)],
'fri': [('fri', 1.0), ('compats', 1.0), ('mo', 0.814126196299157), ('neighbor', 0.7433986938516075), ('parent', 0.32202418215134565), ('bro', 0.8496284151715676), ('fami', 0.6375584385858655), ('best fri', 0.807654599975373)]
}
}
# 移除得分较低的元组
threshold = 0.7 # 设置一个阈值,低于这个阈值的元组将被移除
new_a = {key: [item for item in value if item[1] >= threshold] if isinstance(value, list) else value for key, value in a.items()}
# 打印结果
print(new_a)
这段代码将在new_a
中给出你期望的输出,移除了得分低于阈值的元组。你可以根据需要调整threshold
变量的值来设置不同的阈值。
英文:
I wonder if there is a fast way to remove redundant tuples from dictionary. Suppose I have a dictionary as below:
a = {
'trans': [('pickup', 1.0), ('boat', 1.0), ('plane', 1.0), ('walking', 1.0), ('foot', 1.0), ('train', 0.7455259731472191), ('trailer', 0.7227749512667475), ('car', 0.7759192750865143)],
'actor': {
'autori': [('smug', 1.0), ('pol', 1.0), ('traff', 1.0), ('local authori', 0.6894454471465952), ('driv', 0.6121365092485745), ('car', 0.6297345748705596)],
'fam': [('fa', 1.0), ('mo', 1.0), ('bro', 1.0), ('son', 0.9925431812951816), ('sis', 0.9789254869156859), ('fami', 0.8392597243422916)],
'fri': [('fri', 1.0), ('compats', 1.0), ('mo', 0.814126196299157), ('neighbor', 0.7433986938516075), ('parent', 0.32202418215134565), ('bro', 0.8496284151715676), ('fami', 0.6375584385858655), ('best fri', 0.807654599975373)]
}
}
In this dictionary for example we have tuples like: ('car', 0.7759192750865143) for key 'trans' and ('car', 0.6297345748705596) for key 'autori'. I want to remove the tuple ('car', 0.6297345748705596) because it has a lower score.
My desired output is:
new_a = {
'trans': [('pickup', 1.0), ('boat', 1.0), ('plane', 1.0), ('walking', 1.0), ('foot', 1.0), ('train', 0.7455259731472191), ('trailer', 0.7227749512667475), ('car', 0.7759192750865143)],
'actor': {
'autori': [('smug', 1.0), ('pol', 1.0), ('traff', 1.0), ('local authori', 0.6894454471465952), ('driv', 0.6121365092485745)],
'fam': [('fa', 1.0), ('mo', 1.0), ('bro', 1.0), ('son', 0.9925431812951816), ('sis', 0.9789254869156859), ('fami', 0.8392597243422916)],
'fri': [('fri', 1.0), ('compats', 1.0), ('neighbor', 0.7433986938516075), ('parent', 0.32202418215134565), ('best fri', 0.807654599975373)]
}
}
Is there a fast way to do this or we still need to loop through all values for each key?
答案1
得分: 1
<sub>不确定是否最有效,但由于您还在评论中提到了“简单的解决方案”</sub>
我认为最简单的方法涉及循环遍历每个元组两次:首先收集最佳分数,然后再次筛选其他所有内容。类似于<kbd>new_a = onlyBest( a, bestRef=dict(sorted(getAllPairs(a))) )
</kbd> [请参见下面的函数定义]。
def getAllPairs(obj):
if isinstance(obj, tuple) and len(obj)==2: return [obj]
allPairs = []
if isinstance(obj, dict): obj = obj.values()
if hasattr(obj, '__iter__') and not isinstance(obj, str):
for i in obj: allPairs += getAllPairs(i)
return allPairs
def onlyBest(obj, bestRef:dict):
if isinstance(obj, list):
# if all(isinstance(i, tuple) and len(i)==2 for i in obj):
return [i for i in obj if not i[1] < bestRef.get(i[0],i[1])]
if isinstance(obj, dict):
return {k: onlyBest(v,bestRef) for k, v in obj.items()}
return obj
英文:
<sub>Not sure it's the most efficient, but since you also mentioned "a simple solution" in a comment....</sub>
I think the simplest method would involve looping through every tuple twice: once to collect best scores, and then again to filter everything else. Something like <kbd>new_a = onlyBest( a, bestRef=dict(sorted(getAllPairs(a))) )
</kbd> [see function definitions below].
def getAllPairs(obj):
if isinstance(obj, tuple) and len(obj)==2: return [obj]
allPairs = []
if isinstance(obj, dict): obj = obj.values()
if hasattr(obj, '__iter__') and not isinstance(obj, str):
for i in obj: allPairs += getAllPairs(i)
return allPairs
def onlyBest(obj, bestRef:dict):
if isinstance(obj, list):
# if all(isinstance(i, tuple) and len(i)==2 for i in obj):
return [i for i in obj if not i[1] < bestRef.get(i[0],i[1])]
if isinstance(obj, dict):
return {k: onlyBest(v,bestRef) for k, v in obj.items()}
return obj
答案2
得分: 0
移除较低数值,需要检测重复项,比较数值,跟踪更高数值,并在找到较大数值时删除该数值。您需要的算法至少具有时间复杂度 O(n) 和空间复杂度 O(n)。
英文:
To remove lower values, you need to detect duplicate, compare, keep track of the higher value, and remove value if a bigger one is found. The algorithm you want is at least time O(n) and space O(n).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论