高效快速搜索字典中的字典的方法

huangapple go评论114阅读模式
英文:

Efficient and fast way to search through dict of dicts

问题

这是您提供的代码的翻译部分:

  1. 所以我有一个包含字典的工作字典
  2. {
  3. "hacker": {"crime": "high"},
  4. "mugger": {"crime": "high", "morals": "low"},
  5. "office drone": {"work_drive": "high", "tolerance": "high"},
  6. "farmer": {"work_drive": "high"}
  7. }

我大约还有大约21000个独特的工作要处理,如何更快地扫描它们?

是否有任何数据结构可以让扫描更快、更好?比如每个标签的查找表?

我正在使用Python 3.10.4

注意:如果有帮助,所有内容都在运行时的开头加载,不会在运行时更改

这是我的当前代码:

  1. test_data = {
  2. "hacker": {"crime": "high"},
  3. "mugger": {"crime": "high", "morals": "low"},
  4. "shop_owner": {"crime": "high", "morals": "high"},
  5. "office_drone": {"work_drive": "high", "tolerance": "high"},
  6. "farmer": {"work_drive": "high"},
  7. }
  8. class NULL: pass
  9. class Conditional(object):
  10. def __init__(self, data):
  11. self.dataset = data
  12. def find(self, *target, **tags):
  13. dataset = self.dataset.items()
  14. if target:
  15. dataset = (
  16. (entry, data) for entry, data in dataset
  17. if all( (t in data) for t in target)
  18. )
  19. if tags:
  20. return [
  21. entry for entry, data in dataset
  22. if all(
  23. (data.get(tag, NULL) == val) for tag, val in tags.items()
  24. )
  25. ]
  26. else:
  27. return [data[0] for data in dataset]
  28. jobs = Conditional(test_data)
  29. print(jobs.find(work_drive="high"))
  30. >>> ['office_drone', 'farmer']
  31. print(jobs.find("crime"))
  32. >>> ['hacker', 'mugger', 'shop_owner']
  33. print(jobs.find("crime", "morals"))
  34. >>> ['mugger', 'shop_owner']
  35. print(jobs.find("crime", morals="high"))
  36. >>> ['shop_owner']

希望这能帮助您理解代码的功能。如果您有任何其他问题,请随时提出。

英文:

So I have a dict of working jobs each holding a dict

  1. {
  2. "hacker": {"crime": "high"},
  3. "mugger": {"crime": "high", "morals": "low"},
  4. "office drone": {"work_drive": "high", "tolerance": "high"},
  5. "farmer": {"work_drive": "high"},
  6. }

And I have roughly about 21000 more unique jobs to handle

How would I go about scanning through them faster?

And is there any type of data structure that makes this faster and better to scan through? Such as a lookup table for each of the tags?

I'm using python 3.10.4

NOTE: If it helps, everything is loaded up at the start of runtime and doesn't change during runtime at all

Here's my current code:

  1. test_data = {
  2. "hacker": {"crime": "high"},
  3. "mugger": {"crime": "high", "morals": "low"},
  4. "shop_owner": {"crime": "high", "morals": "high"},
  5. "office_drone": {"work_drive": "high", "tolerance": "high"},
  6. "farmer": {"work_drive": "high"},
  7. }
  8. class NULL: pass
  9. class Conditional(object):
  10. def __init__(self, data):
  11. self.dataset = data
  12. def find(self, *target, **tags):
  13. dataset = self.dataset.items()
  14. if target:
  15. dataset = (
  16. (entry, data) for entry, data in dataset
  17. if all( (t in data) for t in target)
  18. )
  19. if tags:
  20. return [
  21. entry for entry, data in dataset
  22. if all(
  23. (data.get(tag, NULL) == val) for tag, val in tags.items()
  24. )
  25. ]
  26. else:
  27. return [data[0] for data in dataset]
  28. jobs = Conditional(test_data)
  29. print(jobs.find(work_drive="high"))
  30. >>> ['office_drone', 'farmer']
  31. print(jobs.find("crime"))
  32. >>> ['hacker', 'mugger', 'shop_owner']
  33. print(jobs.find("crime", "morals"))
  34. >>> ['mugger', 'shop_owner']
  35. print(jobs.find("crime", morals="high"))
  36. >>> ['shop_owner']

答案1

得分: 2

> 这样能加快扫描速度并且更好吗?

是的。它被称为字典 =)

只需将您的字典分成两个部分,一个按标签分类,另一个按标签和标签值分类,其中包含集合:

  1. from collections import defaultdict
  2. ...
  3. 按标签分类 = defaultdict(set)
  4. 按标签和标签值分类 = defaultdict(lambda: defaultdict(set))
  5. 对于工作,标签 in 测试数据.items():
  6. for 标签,值 in 标签.items():
  7. 按标签[标签].add(工作)
  8. 按标签和标签值分类[标签][值].add(工作)
  9. # 例子
  10. # 搜索犯罪:高 和 道德
  11. 犯罪_ = 按标签和标签值分类["犯罪"]["高"]
  12. 道德 = 按标签["道德"]
  13. 结果 = 犯罪_高.intersection(道德) # {'扒手','店主'}

然后使用它们来搜索所需的集合,并返回出现在所有集合中的工作。

英文:

> And is there any type of data structure that makes this faster and better to scan through?

Yes. And it is called dict =)

Just turn your dict into two dictionaries one by tag and another by tag and tag value which will contain sets:

  1. from collections import defaultdict
  2. ...
  3. by_tag = defaultdict(set)
  4. by_tag_value = defaultdict(lambda: defaultdict(set))
  5. for job, tags in test_data.items():
  6. for tag, val in tags.items():
  7. by_tag[tag].add(job)
  8. by_tag_value[tag][val].add(job)
  9. # example
  10. # to search crime:high and morals
  11. crime_high = by_tag_value["crime"]["high"]
  12. morals = by_tag["morals"]
  13. result = crime_high.intersection(morals) # {'mugger', 'shop_owner'}

And then use them to search needed sets and return jobs which are present in all of the sets.

答案2

得分: 1

在字典中查找第一层级时,可以使用my_dict[key]my_dict.get(key)(它们的功能相同)。所以我认为你只是想用target来进行查找。

然后,如果你想查找哪些工作与某个标签有关,我认为创建一个查找字典是合理的。你可以创建一个字典,其中每个键映射到一个包含该标签的工作列表。

下面的代码将在开始时运行一次,并且会基于test_data创建查找表。它遍历整个字典,每当在项目的值中遇到一个tag时,它会将该项目的键添加到该标签的工作列表中。

  1. lookup = dict()
  2. for k, v in test_data.items():
  3. for kk, vv in v.items():
  4. try:
  5. lookup[kk].append(k)
  6. except KeyError:
  7. lookup[kk] = [k]

输出(lookup):

  1. {'crime': ['hacker', 'mugger', 'shop_owner'],
  2. 'morals': ['mugger', 'shop_owner'],
  3. 'work_drive': ['office_drone', 'farmer'],
  4. 'tolerance': ['office_drone']}

有了这个查找表,你可以通过lookup['crime']来询问‘哪些工作有犯罪记录?’,它会输出['hacker', 'mugger', 'shop_owner']

英文:

When looking up the first-level in the dictionary, the way to do that is either with my_dict[key] or my_dict.get(key) (they do the same thing). So I think you just want to do that with your target lookup.

Then, if you want to look up which jobs include anything about one of the tags, then I think that yea making a lookup dictionary for that is reasonable. You could make a dictionary where each key maps to a list of those jobs.

The below code would be run once at the beginning and would make the lookup based off of the test_data. It loops through the entire dictionary and any time it encounters a tag in the values for an item, it'll add the key from it to the list of jobs for that tag

  1. lookup = dict()
  2. for k,v in test_data.items():
  3. for kk,vv in v.items():
  4. try:
  5. lookup[kk].append(k)
  6. except KeyError:
  7. lookup[kk] = [k]

Output (lookup):

  1. {'crime': ['hacker', 'mugger', 'shop_owner'],
  2. 'morals': ['mugger', 'shop_owner'],
  3. 'work_drive': ['office_drone', 'farmer'],
  4. 'tolerance': ['office_drone']}

With this lookup table, you could ask 'Which jobs have a crime stat?' with lookup['crime'], which would output ['hacker', 'mugger', 'shop_owner']

huangapple
  • 本文由 发表于 2023年2月9日 02:52:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75390455.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定