在另一个映射列表中查找是否存在来自一个映射列表的键的高效方法

huangapple go评论88阅读模式
英文:

Effecient way to find if a key from a list of maps exists within another list of maps

问题

我有两个地图列表。我们称之为A,它存在于数据库中,B是来自传感器的实时结果。

A和B共享键/值对。

示例如下:

A = [
    {
        "created_at": "2020-09-19T17:25:29.547354",
        "id": 1,
        "ip_address": "192.168.1.1",
        "mac_address": "xx:xx:xx:xx:xx:xx",
    },
    {
        "created_at": "2020-09-19T17:25:29.564472",
        "id": 2,
        "ip_address": "192.168.1.2",
        "mac_address": "xx:xx:xx:xx:xx:xx",
    },
    {
        "created_at": "2020-09-19T17:25:29.564472",
        "id": 3,
        "ip_address": "192.168.1.3",
        "mac_address": "xx:xx:xx:xx:xx:xx",
    }
]

B = [
    {
        'mac_address': 'xx:xx:xx:xx:xx:xx',
        'ip_address': '192.168.1.1',
        'status': True
    },
    {
        'mac_address': 'xx:xx:xx:xx:xx:xx',
        'ip_address': '192.168.1.2',
        'status': True
    }
]

通过值ip_address,找出B中与A相比缺失的任何地图的最佳方法是什么。

例如,通过上面的内容我们可以看出,包含ip_address为"192.168.1.3"的地图在B中不存在。目标是尝试找到两者之间不存在的值的列表(如果有的话)。

预期输出是一个列表,如:["192.168.1.3"]

英文:

I have two lists of maps. We'll call them A, which exists within a database, and B, which is live results from a sensor.

A shares Key/Values From B

Example looks like:

A = [
    {
        "created_at": "2020-09-19T17:25:29.547354",
        "id": 1,
        "ip_address": "192.168.1.1",
        "mac_address": "xx:xx:xx:xx:xx:xx",
    },
    {
        "created_at": "2020-09-19T17:25:29.564472",
        "id": 2,
        "ip_address": "192.168.1.2",
        "mac_address": "xx:xx:xx:xx:xx:xx",
    },
    {
        "created_at": "2020-09-19T17:25:29.564472",
        "id": 3,
        "ip_address": "192.168.1.3",
        "mac_address": "xx:xx:xx:xx:xx:xx",
    }
]

B = [
    {
        'mac_address': 'xx:xx:xx:xx:xx:xx',
        'ip_address': '192.168.1.1',
        'status': True
    },
    {
        'mac_address': 'xx:xx:xx:xx:xx:xx',
        'ip_address': '192.168.1.2',
        'status': True
    }
]

What's the best way way to find out any missing maps from B compared to A by the Value ip_address.

For example, we can tell by looking at the above that the map which contains the ip_address "192.168.1.3" doesn't exist within B. The aim is to try and find a list of values which don't exist between the two, if any.

The expected output is a list like: ["192.168.1.3"]

答案1

得分: 1

我提供了一个半高效的解决方案:

package main

import (
	"fmt"
)

func main() {
	a := []map[string]interface{}{
		{
			"created_at":  "2020-09-19T17:25:29.547354",
			"id":          1,
			"ip_address":  "192.168.1.1",
			"mac_address": "xx:xx:xx:xx:xx:xx",
		},
		{
			"created_at":  "2020-09-19T17:25:29.564472",
			"id":          2,
			"ip_address":  "192.168.1.2",
			"mac_address": "xx:xx:xx:xx:xx:xx",
		},
		{
			"created_at":  "2020-09-19T17:25:29.564472",
			"id":          3,
			"ip_address":  "192.168.1.3",
			"mac_address": "xx:xx:xx:xx:xx:xx",
		},
	}

	b := []map[string]interface{}{
		{
			"mac_address": "xx:xx:xx:xx:xx:xx",
			"ip_address":  "192.168.1.1",
			"status":      true,
		},
		{
			"mac_address": "xx:xx:xx:xx:xx:xx",
			"ip_address":  "192.168.1.2",
			"status":      true,
		},
	}

	c, d := collectIpAddresses(a), collectIpAddresses(b)

	var missing []string

	for k := range c {
		if !d[k] {
			missing = append(missing, k)
		}
	}

	for k := range d {
		if !c[k] {
			missing = append(missing, k)
		}
	}

	fmt.Println(missing)
}

// 使用更高效的数据结构存储数据以便进行搜索
func collectIpAddresses(a []map[string]interface{}) map[string]bool {
	b := make(map[string]bool, len(a))
	for _, v := range a {
		b[v["ip_address"].(string)] = true
	}
	return b
}

这是一个很好的解决方案,因为它提供了O(m+n)的复杂度(其中ma的长度,nb的长度)。

相反,那个使用了循环嵌套的解决方案将具有O(m*n)的复杂度。这种复杂度会大大降低算法在较大数据集上的性能。

尽管如此,由于分配内存非常慢,在OP提供的数据集上,最后一个解决方案将提供更好的结果。这可能是一个陷阱,取决于要迭代的数据集的大小。

英文:

I come up with semi efficient solution:

package main
import (
"fmt"
)
func main() {
a := []map[string]interface{}{
{
"created_at":  "2020-09-19T17:25:29.547354",
"id":          1,
"ip_address":  "192.168.1.1",
"mac_address": "xx:xx:xx:xx:xx:xx",
},
{
"created_at":  "2020-09-19T17:25:29.564472",
"id":          2,
"ip_address":  "192.168.1.2",
"mac_address": "xx:xx:xx:xx:xx:xx",
},
{
"created_at":  "2020-09-19T17:25:29.564472",
"id":          3,
"ip_address":  "192.168.1.3",
"mac_address": "xx:xx:xx:xx:xx:xx",
},
}
b := []map[string]interface{}{
{
"mac_address": "xx:xx:xx:xx:xx:xx",
"ip_address":  "192.168.1.1",
"status":      true,
},
{
"mac_address": "xx:xx:xx:xx:xx:xx",
"ip_address":  "192.168.1.2",
"status":      true,
},
}
c, d := collectIpAddresses(a), collectIpAddresses(b)
var missing []string
for k := range c {
if !d[k] {
missing = append(missing, k)
}
}
for k := range d {
if !c[k] {
missing = append(missing, k)
}
}
fmt.Println(missing)
}
// stores data in more efficient data-structure for searching
func collectIpAddresses(a []map[string]interface{}) map[string]bool {
b := make(map[string]bool, len(a))
for _, v := range a {
b[v["ip_address"].(string)] = true
}
return b
}

This is a good solution because it provides O(m+n) complexity (where m is len(a) and n is len(b)).

In contrary, that solution which uses a loop within a loop will have a complexity of O(m*n). That complexity will dramatically reduce the performance of the algorithm on larger datasets.

Although, because allocating is extremely slow, on a dataset as provided by OP, the last solution will provide better results. This might be a catch depending the size of the dataset to iterate.

huangapple
  • 本文由 发表于 2021年7月17日 02:38:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/68413865.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定