2023年2月9日 02:52:16go评论114阅读模式

英文:

Efficient and fast way to search through dict of dicts

问题

这是您提供的代码的翻译部分：

所以我有一个包含字典的工作字典
{
    "hacker": {"crime": "high"},
    "mugger": {"crime": "high", "morals": "low"},
    "office drone": {"work_drive": "high", "tolerance": "high"},
    "farmer": {"work_drive": "high"}
}

我大约还有大约21000个独特的工作要处理，如何更快地扫描它们？

是否有任何数据结构可以让扫描更快、更好？比如每个标签的查找表？

我正在使用Python 3.10.4

注意：如果有帮助，所有内容都在运行时的开头加载，不会在运行时更改

这是我的当前代码：

test_data = {
    "hacker": {"crime": "high"},
    "mugger": {"crime": "high", "morals": "low"},
    "shop_owner": {"crime": "high", "morals": "high"},
    "office_drone": {"work_drive": "high", "tolerance": "high"},
    "farmer": {"work_drive": "high"},
}
class NULL: pass
class Conditional(object):
    def __init__(self, data):
        self.dataset = data
        
    def find(self, *target, **tags):
        dataset = self.dataset.items()
   
        if target:
            dataset = (
                (entry, data) for entry, data in dataset
                if all( (t in data) for t in target)
                )
        if tags:
            return [
                entry for entry, data in dataset
                if all(
                    (data.get(tag, NULL) == val) for tag, val in tags.items()
                    )
                ]
        else:
             return [data[0] for data in dataset]
jobs = Conditional(test_data)
print(jobs.find(work_drive="high"))
>>> ['office_drone', 'farmer']
print(jobs.find("crime"))
>>> ['hacker', 'mugger', 'shop_owner']
print(jobs.find("crime", "morals"))
>>> ['mugger', 'shop_owner']
print(jobs.find("crime", morals="high"))
>>> ['shop_owner']

希望这能帮助您理解代码的功能。如果您有任何其他问题，请随时提出。

英文:

So I have a dict of working jobs each holding a dict

{
    &quot;hacker&quot;: {&quot;crime&quot;: &quot;high&quot;},
    &quot;mugger&quot;: {&quot;crime&quot;: &quot;high&quot;, &quot;morals&quot;: &quot;low&quot;},
    &quot;office drone&quot;: {&quot;work_drive&quot;: &quot;high&quot;, &quot;tolerance&quot;: &quot;high&quot;},
    &quot;farmer&quot;: {&quot;work_drive&quot;: &quot;high&quot;},
}

And I have roughly about 21000 more unique jobs to handle

How would I go about scanning through them faster?

And is there any type of data structure that makes this faster and better to scan through? Such as a lookup table for each of the tags?

I'm using python 3.10.4

NOTE: If it helps, everything is loaded up at the start of runtime and doesn't change during runtime at all

Here's my current code:

test_data = {
    &quot;hacker&quot;: {&quot;crime&quot;: &quot;high&quot;},
    &quot;mugger&quot;: {&quot;crime&quot;: &quot;high&quot;, &quot;morals&quot;: &quot;low&quot;},
    &quot;shop_owner&quot;: {&quot;crime&quot;: &quot;high&quot;, &quot;morals&quot;: &quot;high&quot;},
    &quot;office_drone&quot;: {&quot;work_drive&quot;: &quot;high&quot;, &quot;tolerance&quot;: &quot;high&quot;},
    &quot;farmer&quot;: {&quot;work_drive&quot;: &quot;high&quot;},
}
class NULL: pass
class Conditional(object):
    def __init__(self, data):
        self.dataset = data
        
    def find(self, *target, **tags):
        dataset = self.dataset.items()
   
        if target:
            dataset = (
                (entry, data) for entry, data in dataset
                if all( (t in data) for t in target)
                )
        if tags:
            return [
                entry for entry, data in dataset
                if all(
                    (data.get(tag, NULL) == val) for tag, val in tags.items()
                    )
                ]
        else:
             return [data[0] for data in dataset]
jobs = Conditional(test_data)
print(jobs.find(work_drive=&quot;high&quot;))
&gt;&gt;&gt; [&#39;office_drone&#39;, &#39;farmer&#39;]
print(jobs.find(&quot;crime&quot;))
&gt;&gt;&gt; [&#39;hacker&#39;, &#39;mugger&#39;, &#39;shop_owner&#39;]
print(jobs.find(&quot;crime&quot;, &quot;morals&quot;))
&gt;&gt;&gt; [&#39;mugger&#39;, &#39;shop_owner&#39;]
print(jobs.find(&quot;crime&quot;, morals=&quot;high&quot;))
&gt;&gt;&gt; [&#39;shop_owner&#39;]

答案1

得分: 2

> 这样能加快扫描速度并且更好吗？

是的。它被称为字典 =)

只需将您的字典分成两个部分，一个按标签分类，另一个按标签和标签值分类，其中包含集合：

from collections import defaultdict
...
按标签分类 = defaultdict(set)
按标签和标签值分类 = defaultdict(lambda: defaultdict(set))
对于工作，标签 in 测试数据.items():
    for 标签，值 in 标签.items():
        按标签[标签].add(工作)
        按标签和标签值分类[标签][值].add(工作)
# 例子
# 搜索犯罪：高 和 道德
犯罪_高 = 按标签和标签值分类[&quot;犯罪&quot;][&quot;高&quot;]
道德 = 按标签[&quot;道德&quot;]
结果 = 犯罪_高.intersection(道德) # {&#39;扒手&#39;，&#39;店主&#39;}

然后使用它们来搜索所需的集合，并返回出现在所有集合中的工作。

英文:

> And is there any type of data structure that makes this faster and better to scan through?

Yes. And it is called dict =)

Just turn your dict into two dictionaries one by tag and another by tag and tag value which will contain sets:

from collections import defaultdict
... 
by_tag = defaultdict(set)
by_tag_value = defaultdict(lambda: defaultdict(set))
for job, tags in test_data.items():
    for tag, val in tags.items():
        by_tag[tag].add(job)
        by_tag_value[tag][val].add(job)
# example
# to search crime:high and morals 
crime_high = by_tag_value[&quot;crime&quot;][&quot;high&quot;]
morals = by_tag[&quot;morals&quot;]
result = crime_high.intersection(morals) # {&#39;mugger&#39;, &#39;shop_owner&#39;}

And then use them to search needed sets and return jobs which are present in all of the sets.

答案2

得分: 1

在字典中查找第一层级时，可以使用my_dict[key]或my_dict.get(key)（它们的功能相同）。所以我认为你只是想用target来进行查找。

然后，如果你想查找哪些工作与某个标签有关，我认为创建一个查找字典是合理的。你可以创建一个字典，其中每个键映射到一个包含该标签的工作列表。

下面的代码将在开始时运行一次，并且会基于test_data创建查找表。它遍历整个字典，每当在项目的值中遇到一个tag时，它会将该项目的键添加到该标签的工作列表中。

lookup = dict()
for k, v in test_data.items():
    for kk, vv in v.items():
         try:
             lookup[kk].append(k)
         except KeyError:
             lookup[kk] = [k]

输出（lookup）：

{'crime': ['hacker', 'mugger', 'shop_owner'],
 'morals': ['mugger', 'shop_owner'],
 'work_drive': ['office_drone', 'farmer'],
 'tolerance': ['office_drone']}

有了这个查找表，你可以通过lookup['crime']来询问‘哪些工作有犯罪记录？’，它会输出['hacker', 'mugger', 'shop_owner']。

英文:

When looking up the first-level in the dictionary, the way to do that is either with my_dict[key] or my_dict.get(key) (they do the same thing). So I think you just want to do that with your target lookup.

Then, if you want to look up which jobs include anything about one of the tags, then I think that yea making a lookup dictionary for that is reasonable. You could make a dictionary where each key maps to a list of those jobs.

The below code would be run once at the beginning and would make the lookup based off of the test_data. It loops through the entire dictionary and any time it encounters a tag in the values for an item, it'll add the key from it to the list of jobs for that tag

lookup = dict()
for k,v in test_data.items():
    for kk,vv in v.items():
         try:
             lookup[kk].append(k)
         except KeyError:
             lookup[kk] = [k]

Output (lookup):

{&#39;crime&#39;: [&#39;hacker&#39;, &#39;mugger&#39;, &#39;shop_owner&#39;],
 &#39;morals&#39;: [&#39;mugger&#39;, &#39;shop_owner&#39;],
 &#39;work_drive&#39;: [&#39;office_drone&#39;, &#39;farmer&#39;],
 &#39;tolerance&#39;: [&#39;office_drone&#39;]}

With this lookup table, you could ask 'Which jobs have a crime stat?' with lookup['crime'], which would output ['hacker', 'mugger', 'shop_owner']

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

高效快速搜索字典中的字典的方法

问题

答案1

答案2

用Python在大型整数列表中高效搜索最长递增子序列

发送WhatsApp消息时出现错误，使用pywhatkit。

Printing of Chinese Characters to printer results in weird characters despite using UTF-8, GB18030, or BIG5 character encodings

如何替换列表中的值

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。