英文:
Effecient way to find if a key from a list of maps exists within another list of maps
问题
我有两个地图列表。我们称之为A,它存在于数据库中,B是来自传感器的实时结果。
A和B共享键/值对。
示例如下:
A = [
{
"created_at": "2020-09-19T17:25:29.547354",
"id": 1,
"ip_address": "192.168.1.1",
"mac_address": "xx:xx:xx:xx:xx:xx",
},
{
"created_at": "2020-09-19T17:25:29.564472",
"id": 2,
"ip_address": "192.168.1.2",
"mac_address": "xx:xx:xx:xx:xx:xx",
},
{
"created_at": "2020-09-19T17:25:29.564472",
"id": 3,
"ip_address": "192.168.1.3",
"mac_address": "xx:xx:xx:xx:xx:xx",
}
]
B = [
{
'mac_address': 'xx:xx:xx:xx:xx:xx',
'ip_address': '192.168.1.1',
'status': True
},
{
'mac_address': 'xx:xx:xx:xx:xx:xx',
'ip_address': '192.168.1.2',
'status': True
}
]
通过值ip_address
,找出B中与A相比缺失的任何地图的最佳方法是什么。
例如,通过上面的内容我们可以看出,包含ip_address为"192.168.1.3"的地图在B中不存在。目标是尝试找到两者之间不存在的值的列表(如果有的话)。
预期输出是一个列表,如:["192.168.1.3"]
英文:
I have two lists of maps. We'll call them A, which exists within a database, and B, which is live results from a sensor.
A shares Key/Values From B
Example looks like:
A = [
{
"created_at": "2020-09-19T17:25:29.547354",
"id": 1,
"ip_address": "192.168.1.1",
"mac_address": "xx:xx:xx:xx:xx:xx",
},
{
"created_at": "2020-09-19T17:25:29.564472",
"id": 2,
"ip_address": "192.168.1.2",
"mac_address": "xx:xx:xx:xx:xx:xx",
},
{
"created_at": "2020-09-19T17:25:29.564472",
"id": 3,
"ip_address": "192.168.1.3",
"mac_address": "xx:xx:xx:xx:xx:xx",
}
]
B = [
{
'mac_address': 'xx:xx:xx:xx:xx:xx',
'ip_address': '192.168.1.1',
'status': True
},
{
'mac_address': 'xx:xx:xx:xx:xx:xx',
'ip_address': '192.168.1.2',
'status': True
}
]
What's the best way way to find out any missing maps from B compared to A by the Value ip_address
.
For example, we can tell by looking at the above that the map which contains the ip_address "192.168.1.3" doesn't exist within B. The aim is to try and find a list of values which don't exist between the two, if any.
The expected output is a list like: ["192.168.1.3"]
答案1
得分: 1
我提供了一个半高效的解决方案:
package main
import (
"fmt"
)
func main() {
a := []map[string]interface{}{
{
"created_at": "2020-09-19T17:25:29.547354",
"id": 1,
"ip_address": "192.168.1.1",
"mac_address": "xx:xx:xx:xx:xx:xx",
},
{
"created_at": "2020-09-19T17:25:29.564472",
"id": 2,
"ip_address": "192.168.1.2",
"mac_address": "xx:xx:xx:xx:xx:xx",
},
{
"created_at": "2020-09-19T17:25:29.564472",
"id": 3,
"ip_address": "192.168.1.3",
"mac_address": "xx:xx:xx:xx:xx:xx",
},
}
b := []map[string]interface{}{
{
"mac_address": "xx:xx:xx:xx:xx:xx",
"ip_address": "192.168.1.1",
"status": true,
},
{
"mac_address": "xx:xx:xx:xx:xx:xx",
"ip_address": "192.168.1.2",
"status": true,
},
}
c, d := collectIpAddresses(a), collectIpAddresses(b)
var missing []string
for k := range c {
if !d[k] {
missing = append(missing, k)
}
}
for k := range d {
if !c[k] {
missing = append(missing, k)
}
}
fmt.Println(missing)
}
// 使用更高效的数据结构存储数据以便进行搜索
func collectIpAddresses(a []map[string]interface{}) map[string]bool {
b := make(map[string]bool, len(a))
for _, v := range a {
b[v["ip_address"].(string)] = true
}
return b
}
这是一个很好的解决方案,因为它提供了O(m+n)
的复杂度(其中m
是a
的长度,n
是b
的长度)。
相反,那个使用了循环嵌套
的解决方案将具有O(m*n)
的复杂度。这种复杂度会大大降低算法在较大数据集上的性能。
尽管如此,由于分配内存非常慢,在OP提供的数据集上,最后一个解决方案将提供更好的结果。这可能是一个陷阱,取决于要迭代的数据集的大小。
英文:
I come up with semi efficient solution:
package main
import (
"fmt"
)
func main() {
a := []map[string]interface{}{
{
"created_at": "2020-09-19T17:25:29.547354",
"id": 1,
"ip_address": "192.168.1.1",
"mac_address": "xx:xx:xx:xx:xx:xx",
},
{
"created_at": "2020-09-19T17:25:29.564472",
"id": 2,
"ip_address": "192.168.1.2",
"mac_address": "xx:xx:xx:xx:xx:xx",
},
{
"created_at": "2020-09-19T17:25:29.564472",
"id": 3,
"ip_address": "192.168.1.3",
"mac_address": "xx:xx:xx:xx:xx:xx",
},
}
b := []map[string]interface{}{
{
"mac_address": "xx:xx:xx:xx:xx:xx",
"ip_address": "192.168.1.1",
"status": true,
},
{
"mac_address": "xx:xx:xx:xx:xx:xx",
"ip_address": "192.168.1.2",
"status": true,
},
}
c, d := collectIpAddresses(a), collectIpAddresses(b)
var missing []string
for k := range c {
if !d[k] {
missing = append(missing, k)
}
}
for k := range d {
if !c[k] {
missing = append(missing, k)
}
}
fmt.Println(missing)
}
// stores data in more efficient data-structure for searching
func collectIpAddresses(a []map[string]interface{}) map[string]bool {
b := make(map[string]bool, len(a))
for _, v := range a {
b[v["ip_address"].(string)] = true
}
return b
}
This is a good solution because it provides O(m+n)
complexity (where m
is len(a)
and n
is len(b)
).
In contrary, that solution which uses a loop within a loop
will have a complexity of O(m*n)
. That complexity will dramatically reduce the performance of the algorithm on larger datasets.
Although, because allocating is extremely slow, on a dataset as provided by OP, the last solution will provide better results. This might be a catch depending the size of the dataset to iterate.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论