根据分数从字典中移除多余的元组

huangapple go评论57阅读模式
英文:

Remove redundant tuples from dictionary based on the score

问题

你可以使用以下代码来从字典中移除得分较低的元组,而不必为每个键循环遍历所有值。这个方法使用了Python的列表推导和字典推导:

# 原始字典
a = {
    'trans': [('pickup', 1.0), ('boat', 1.0), ('plane', 1.0), ('walking', 1.0), ('foot', 1.0), ('train', 0.7455259731472191), ('trailer', 0.7227749512667475), ('car', 0.7759192750865143)],
    'actor': {
        'autori': [('smug', 1.0), ('pol', 1.0), ('traff', 1.0), ('local authori', 0.6894454471465952), ('driv', 0.6121365092485745), ('car', 0.6297345748705596)],
        'fam': [('fa', 1.0), ('mo', 1.0), ('bro', 1.0), ('son', 0.9925431812951816), ('sis', 0.9789254869156859), ('fami', 0.8392597243422916)],
        'fri': [('fri', 1.0), ('compats', 1.0), ('mo', 0.814126196299157), ('neighbor', 0.7433986938516075), ('parent', 0.32202418215134565), ('bro', 0.8496284151715676), ('fami', 0.6375584385858655), ('best fri', 0.807654599975373)]
    }
}

# 移除得分较低的元组
threshold = 0.7  # 设置一个阈值,低于这个阈值的元组将被移除

new_a = {key: [item for item in value if item[1] >= threshold] if isinstance(value, list) else value for key, value in a.items()}

# 打印结果
print(new_a)

这段代码将在new_a中给出你期望的输出,移除了得分低于阈值的元组。你可以根据需要调整threshold变量的值来设置不同的阈值。

英文:

I wonder if there is a fast way to remove redundant tuples from dictionary. Suppose I have a dictionary as below:

a = {
    'trans': [('pickup', 1.0), ('boat', 1.0), ('plane', 1.0), ('walking', 1.0), ('foot', 1.0), ('train', 0.7455259731472191), ('trailer', 0.7227749512667475), ('car', 0.7759192750865143)],

    'actor': {
    'autori': [('smug', 1.0), ('pol', 1.0), ('traff', 1.0), ('local authori', 0.6894454471465952), ('driv', 0.6121365092485745), ('car', 0.6297345748705596)],

    'fam': [('fa', 1.0), ('mo', 1.0), ('bro', 1.0), ('son', 0.9925431812951816), ('sis', 0.9789254869156859), ('fami', 0.8392597243422916)],

    'fri': [('fri', 1.0), ('compats', 1.0), ('mo', 0.814126196299157), ('neighbor', 0.7433986938516075), ('parent', 0.32202418215134565), ('bro', 0.8496284151715676),  ('fami', 0.6375584385858655), ('best fri', 0.807654599975373)]
            }
    }

In this dictionary for example we have tuples like: ('car', 0.7759192750865143) for key 'trans' and ('car', 0.6297345748705596) for key 'autori'. I want to remove the tuple ('car', 0.6297345748705596) because it has a lower score.

My desired output is:

new_a = {
    'trans': [('pickup', 1.0), ('boat', 1.0), ('plane', 1.0), ('walking', 1.0), ('foot', 1.0), ('train', 0.7455259731472191), ('trailer', 0.7227749512667475), ('car', 0.7759192750865143)],

    'actor': {
    'autori': [('smug', 1.0), ('pol', 1.0), ('traff', 1.0), ('local authori', 0.6894454471465952), ('driv', 0.6121365092485745)],

    'fam': [('fa', 1.0), ('mo', 1.0), ('bro', 1.0), ('son', 0.9925431812951816), ('sis', 0.9789254869156859), ('fami', 0.8392597243422916)],

    'fri': [('fri', 1.0), ('compats', 1.0), ('neighbor', 0.7433986938516075), ('parent', 0.32202418215134565), ('best fri', 0.807654599975373)]
            }
    }

Is there a fast way to do this or we still need to loop through all values for each key?

答案1

得分: 1

<sub>不确定是否最有效,但由于您还在评论中提到了“简单的解决方案”</sub>

我认为最简单的方法涉及循环遍历每个元组两次:首先收集最佳分数,然后再次筛选其他所有内容。类似于<kbd>new_a = onlyBest( a, bestRef=dict(sorted(getAllPairs(a))) )</kbd> [请参见下面的函数定义]。

def getAllPairs(obj):
    if isinstance(obj, tuple) and len(obj)==2: return [obj]
    allPairs = []
    if isinstance(obj, dict): obj = obj.values()
    if hasattr(obj, &#39;__iter__&#39;) and not isinstance(obj, str):
        for i in obj: allPairs += getAllPairs(i)
    return allPairs

def onlyBest(obj, bestRef:dict):
    if isinstance(obj, list):
      # if all(isinstance(i, tuple) and len(i)==2 for i in obj):
        return [i for i in obj if not i[1] &lt; bestRef.get(i[0],i[1])]
    if isinstance(obj, dict):
        return {k: onlyBest(v,bestRef) for k, v in obj.items()}
    return obj        
英文:

<sub>Not sure it's the most efficient, but since you also mentioned "a simple solution" in a comment....</sub>

I think the simplest method would involve looping through every tuple twice: once to collect best scores, and then again to filter everything else. Something like <kbd>new_a = onlyBest( a, bestRef=dict(sorted(getAllPairs(a))) )</kbd> [see function definitions below].

def getAllPairs(obj):
    if isinstance(obj, tuple) and len(obj)==2: return [obj]
    allPairs = []
    if isinstance(obj, dict): obj = obj.values()
    if hasattr(obj, &#39;__iter__&#39;) and not isinstance(obj, str):
        for i in obj: allPairs += getAllPairs(i)
    return allPairs

def onlyBest(obj, bestRef:dict):
    if isinstance(obj, list):
      # if all(isinstance(i, tuple) and len(i)==2 for i in obj):
        return [i for i in obj if not i[1] &lt; bestRef.get(i[0],i[1])]
    if isinstance(obj, dict):
        return {k: onlyBest(v,bestRef) for k, v in obj.items()}
    return obj        

答案2

得分: 0

移除较低数值,需要检测重复项,比较数值,跟踪更高数值,并在找到较大数值时删除该数值。您需要的算法至少具有时间复杂度 O(n) 和空间复杂度 O(n)。

英文:

To remove lower values, you need to detect duplicate, compare, keep track of the higher value, and remove value if a bigger one is found. The algorithm you want is at least time O(n) and space O(n).

huangapple
  • 本文由 发表于 2023年2月14日 21:01:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75448227.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定