高效快速搜索字典中的字典的方法

huangapple go评论64阅读模式
英文:

Efficient and fast way to search through dict of dicts

问题

这是您提供的代码的翻译部分:

所以我有一个包含字典的工作字典
{
    "hacker": {"crime": "high"},
    "mugger": {"crime": "high", "morals": "low"},
    "office drone": {"work_drive": "high", "tolerance": "high"},
    "farmer": {"work_drive": "high"}
}

我大约还有大约21000个独特的工作要处理,如何更快地扫描它们?

是否有任何数据结构可以让扫描更快、更好?比如每个标签的查找表?

我正在使用Python 3.10.4

注意:如果有帮助,所有内容都在运行时的开头加载,不会在运行时更改

这是我的当前代码:

test_data = {
    "hacker": {"crime": "high"},
    "mugger": {"crime": "high", "morals": "low"},
    "shop_owner": {"crime": "high", "morals": "high"},
    "office_drone": {"work_drive": "high", "tolerance": "high"},
    "farmer": {"work_drive": "high"},
}

class NULL: pass

class Conditional(object):
    def __init__(self, data):
        self.dataset = data
        
    def find(self, *target, **tags):
        dataset = self.dataset.items()
   
        if target:
            dataset = (
                (entry, data) for entry, data in dataset
                if all( (t in data) for t in target)
                )

        if tags:
            return [
                entry for entry, data in dataset
                if all(
                    (data.get(tag, NULL) == val) for tag, val in tags.items()
                    )
                ]
        else:
             return [data[0] for data in dataset]

jobs = Conditional(test_data)

print(jobs.find(work_drive="high"))
>>> ['office_drone', 'farmer']
print(jobs.find("crime"))
>>> ['hacker', 'mugger', 'shop_owner']
print(jobs.find("crime", "morals"))
>>> ['mugger', 'shop_owner']
print(jobs.find("crime", morals="high"))
>>> ['shop_owner']

希望这能帮助您理解代码的功能。如果您有任何其他问题,请随时提出。

英文:

So I have a dict of working jobs each holding a dict

{
    "hacker": {"crime": "high"},
    "mugger": {"crime": "high", "morals": "low"},
    "office drone": {"work_drive": "high", "tolerance": "high"},
    "farmer": {"work_drive": "high"},
}

And I have roughly about 21000 more unique jobs to handle

How would I go about scanning through them faster?

And is there any type of data structure that makes this faster and better to scan through? Such as a lookup table for each of the tags?

I'm using python 3.10.4

NOTE: If it helps, everything is loaded up at the start of runtime and doesn't change during runtime at all

Here's my current code:

test_data = {
    "hacker": {"crime": "high"},
    "mugger": {"crime": "high", "morals": "low"},
    "shop_owner": {"crime": "high", "morals": "high"},
    "office_drone": {"work_drive": "high", "tolerance": "high"},
    "farmer": {"work_drive": "high"},
}

class NULL: pass

class Conditional(object):
    def __init__(self, data):
        self.dataset = data
        
    def find(self, *target, **tags):
        dataset = self.dataset.items()
   
        if target:
            dataset = (
                (entry, data) for entry, data in dataset
                if all( (t in data) for t in target)
                )

        if tags:
            return [
                entry for entry, data in dataset
                if all(
                    (data.get(tag, NULL) == val) for tag, val in tags.items()
                    )
                ]
        else:
             return [data[0] for data in dataset]

jobs = Conditional(test_data)

print(jobs.find(work_drive="high"))
>>> ['office_drone', 'farmer']
print(jobs.find("crime"))
>>> ['hacker', 'mugger', 'shop_owner']
print(jobs.find("crime", "morals"))
>>> ['mugger', 'shop_owner']
print(jobs.find("crime", morals="high"))
>>> ['shop_owner']

答案1

得分: 2

> 这样能加快扫描速度并且更好吗?

是的。它被称为字典 =)

只需将您的字典分成两个部分,一个按标签分类,另一个按标签和标签值分类,其中包含集合:

from collections import defaultdict

...

按标签分类 = defaultdict(set)
按标签和标签值分类 = defaultdict(lambda: defaultdict(set))

对于工作,标签 in 测试数据.items():
    for 标签,值 in 标签.items():
        按标签[标签].add(工作)
        按标签和标签值分类[标签][值].add(工作)

# 例子
# 搜索犯罪:高 和 道德

犯罪_高 = 按标签和标签值分类["犯罪"]["高"]
道德 = 按标签["道德"]
结果 = 犯罪_高.intersection(道德) # {'扒手','店主'}

然后使用它们来搜索所需的集合,并返回出现在所有集合中的工作。

英文:

> And is there any type of data structure that makes this faster and better to scan through?

Yes. And it is called dict =)

Just turn your dict into two dictionaries one by tag and another by tag and tag value which will contain sets:

from collections import defaultdict

... 

by_tag = defaultdict(set)
by_tag_value = defaultdict(lambda: defaultdict(set))

for job, tags in test_data.items():
    for tag, val in tags.items():
        by_tag[tag].add(job)
        by_tag_value[tag][val].add(job)

# example
# to search crime:high and morals 

crime_high = by_tag_value["crime"]["high"]
morals = by_tag["morals"]
result = crime_high.intersection(morals) # {'mugger', 'shop_owner'}

And then use them to search needed sets and return jobs which are present in all of the sets.

答案2

得分: 1

在字典中查找第一层级时,可以使用my_dict[key]my_dict.get(key)(它们的功能相同)。所以我认为你只是想用target来进行查找。

然后,如果你想查找哪些工作与某个标签有关,我认为创建一个查找字典是合理的。你可以创建一个字典,其中每个键映射到一个包含该标签的工作列表。

下面的代码将在开始时运行一次,并且会基于test_data创建查找表。它遍历整个字典,每当在项目的值中遇到一个tag时,它会将该项目的键添加到该标签的工作列表中。

lookup = dict()
for k, v in test_data.items():
    for kk, vv in v.items():
         try:
             lookup[kk].append(k)
         except KeyError:
             lookup[kk] = [k]

输出(lookup):

{'crime': ['hacker', 'mugger', 'shop_owner'],
 'morals': ['mugger', 'shop_owner'],
 'work_drive': ['office_drone', 'farmer'],
 'tolerance': ['office_drone']}

有了这个查找表,你可以通过lookup['crime']来询问‘哪些工作有犯罪记录?’,它会输出['hacker', 'mugger', 'shop_owner']

英文:

When looking up the first-level in the dictionary, the way to do that is either with my_dict[key] or my_dict.get(key) (they do the same thing). So I think you just want to do that with your target lookup.

Then, if you want to look up which jobs include anything about one of the tags, then I think that yea making a lookup dictionary for that is reasonable. You could make a dictionary where each key maps to a list of those jobs.

The below code would be run once at the beginning and would make the lookup based off of the test_data. It loops through the entire dictionary and any time it encounters a tag in the values for an item, it'll add the key from it to the list of jobs for that tag

lookup = dict()
for k,v in test_data.items():
    for kk,vv in v.items():
         try:
             lookup[kk].append(k)
         except KeyError:
             lookup[kk] = [k]

Output (lookup):

{'crime': ['hacker', 'mugger', 'shop_owner'],
 'morals': ['mugger', 'shop_owner'],
 'work_drive': ['office_drone', 'farmer'],
 'tolerance': ['office_drone']}

With this lookup table, you could ask 'Which jobs have a crime stat?' with lookup['crime'], which would output ['hacker', 'mugger', 'shop_owner']

huangapple
  • 本文由 发表于 2023年2月9日 02:52:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75390455.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定