英文:
Efficient and fast way to search through dict of dicts
问题
这是您提供的代码的翻译部分:
所以我有一个包含字典的工作字典
{
"hacker": {"crime": "high"},
"mugger": {"crime": "high", "morals": "low"},
"office drone": {"work_drive": "high", "tolerance": "high"},
"farmer": {"work_drive": "high"}
}
我大约还有大约21000个独特的工作要处理,如何更快地扫描它们?
是否有任何数据结构可以让扫描更快、更好?比如每个标签的查找表?
我正在使用Python 3.10.4
注意:如果有帮助,所有内容都在运行时的开头加载,不会在运行时更改
这是我的当前代码:
test_data = {
"hacker": {"crime": "high"},
"mugger": {"crime": "high", "morals": "low"},
"shop_owner": {"crime": "high", "morals": "high"},
"office_drone": {"work_drive": "high", "tolerance": "high"},
"farmer": {"work_drive": "high"},
}
class NULL: pass
class Conditional(object):
def __init__(self, data):
self.dataset = data
def find(self, *target, **tags):
dataset = self.dataset.items()
if target:
dataset = (
(entry, data) for entry, data in dataset
if all( (t in data) for t in target)
)
if tags:
return [
entry for entry, data in dataset
if all(
(data.get(tag, NULL) == val) for tag, val in tags.items()
)
]
else:
return [data[0] for data in dataset]
jobs = Conditional(test_data)
print(jobs.find(work_drive="high"))
>>> ['office_drone', 'farmer']
print(jobs.find("crime"))
>>> ['hacker', 'mugger', 'shop_owner']
print(jobs.find("crime", "morals"))
>>> ['mugger', 'shop_owner']
print(jobs.find("crime", morals="high"))
>>> ['shop_owner']
希望这能帮助您理解代码的功能。如果您有任何其他问题,请随时提出。
英文:
So I have a dict of working jobs each holding a dict
{
"hacker": {"crime": "high"},
"mugger": {"crime": "high", "morals": "low"},
"office drone": {"work_drive": "high", "tolerance": "high"},
"farmer": {"work_drive": "high"},
}
And I have roughly about 21000 more unique jobs to handle
How would I go about scanning through them faster?
And is there any type of data structure that makes this faster and better to scan through? Such as a lookup table for each of the tags?
I'm using python 3.10.4
NOTE: If it helps, everything is loaded up at the start of runtime and doesn't change during runtime at all
Here's my current code:
test_data = {
"hacker": {"crime": "high"},
"mugger": {"crime": "high", "morals": "low"},
"shop_owner": {"crime": "high", "morals": "high"},
"office_drone": {"work_drive": "high", "tolerance": "high"},
"farmer": {"work_drive": "high"},
}
class NULL: pass
class Conditional(object):
def __init__(self, data):
self.dataset = data
def find(self, *target, **tags):
dataset = self.dataset.items()
if target:
dataset = (
(entry, data) for entry, data in dataset
if all( (t in data) for t in target)
)
if tags:
return [
entry for entry, data in dataset
if all(
(data.get(tag, NULL) == val) for tag, val in tags.items()
)
]
else:
return [data[0] for data in dataset]
jobs = Conditional(test_data)
print(jobs.find(work_drive="high"))
>>> ['office_drone', 'farmer']
print(jobs.find("crime"))
>>> ['hacker', 'mugger', 'shop_owner']
print(jobs.find("crime", "morals"))
>>> ['mugger', 'shop_owner']
print(jobs.find("crime", morals="high"))
>>> ['shop_owner']
答案1
得分: 2
> 这样能加快扫描速度并且更好吗?
是的。它被称为字典 =)
只需将您的字典分成两个部分,一个按标签分类,另一个按标签和标签值分类,其中包含集合:
from collections import defaultdict
...
按标签分类 = defaultdict(set)
按标签和标签值分类 = defaultdict(lambda: defaultdict(set))
对于工作,标签 in 测试数据.items():
for 标签,值 in 标签.items():
按标签[标签].add(工作)
按标签和标签值分类[标签][值].add(工作)
# 例子
# 搜索犯罪:高 和 道德
犯罪_高 = 按标签和标签值分类["犯罪"]["高"]
道德 = 按标签["道德"]
结果 = 犯罪_高.intersection(道德) # {'扒手','店主'}
然后使用它们来搜索所需的集合,并返回出现在所有集合中的工作。
英文:
> And is there any type of data structure that makes this faster and better to scan through?
Yes. And it is called dict =)
Just turn your dict into two dictionaries one by tag and another by tag and tag value which will contain sets:
from collections import defaultdict
...
by_tag = defaultdict(set)
by_tag_value = defaultdict(lambda: defaultdict(set))
for job, tags in test_data.items():
for tag, val in tags.items():
by_tag[tag].add(job)
by_tag_value[tag][val].add(job)
# example
# to search crime:high and morals
crime_high = by_tag_value["crime"]["high"]
morals = by_tag["morals"]
result = crime_high.intersection(morals) # {'mugger', 'shop_owner'}
And then use them to search needed sets and return jobs which are present in all of the sets.
答案2
得分: 1
在字典中查找第一层级时,可以使用my_dict[key]
或my_dict.get(key)
(它们的功能相同)。所以我认为你只是想用target
来进行查找。
然后,如果你想查找哪些工作与某个标签有关,我认为创建一个查找字典是合理的。你可以创建一个字典,其中每个键映射到一个包含该标签的工作列表。
下面的代码将在开始时运行一次,并且会基于test_data
创建查找表。它遍历整个字典,每当在项目的值中遇到一个tag
时,它会将该项目的键添加到该标签的工作列表中。
lookup = dict()
for k, v in test_data.items():
for kk, vv in v.items():
try:
lookup[kk].append(k)
except KeyError:
lookup[kk] = [k]
输出(lookup
):
{'crime': ['hacker', 'mugger', 'shop_owner'],
'morals': ['mugger', 'shop_owner'],
'work_drive': ['office_drone', 'farmer'],
'tolerance': ['office_drone']}
有了这个查找表,你可以通过lookup['crime']
来询问‘哪些工作有犯罪记录?’,它会输出['hacker', 'mugger', 'shop_owner']
。
英文:
When looking up the first-level in the dictionary, the way to do that is either with my_dict[key]
or my_dict.get(key)
(they do the same thing). So I think you just want to do that with your target
lookup.
Then, if you want to look up which jobs include anything about one of the tags, then I think that yea making a lookup dictionary for that is reasonable. You could make a dictionary where each key maps to a list of those jobs.
The below code would be run once at the beginning and would make the lookup based off of the test_data
. It loops through the entire dictionary and any time it encounters a tag
in the values for an item, it'll add the key from it to the list of jobs for that tag
lookup = dict()
for k,v in test_data.items():
for kk,vv in v.items():
try:
lookup[kk].append(k)
except KeyError:
lookup[kk] = [k]
Output (lookup
):
{'crime': ['hacker', 'mugger', 'shop_owner'],
'morals': ['mugger', 'shop_owner'],
'work_drive': ['office_drone', 'farmer'],
'tolerance': ['office_drone']}
With this lookup table, you could ask 'Which jobs have a crime stat?' with lookup['crime']
, which would output ['hacker', 'mugger', 'shop_owner']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论