在Elasticsearch中,基于第一级聚合的条件进行两层嵌套聚合。

huangapple go评论53阅读模式
英文:

two level nested aggregation in elastic search based on condition over first level aggregation

问题

我的ES文档结构如下:

{
"_index": "my_index",
"_type": "_doc",
"_id": "1296",
"_version": 1,
"_seq_no": 431,
"_primary_term": 1,
"_routing": "1296",
"found": true,
"_source": {
"id": 1296,
"test_name": "abc",
"test_id": 513,
"inventory_arr": [
{
"city": "bangalore",
"after_tat": 168,
"before_tat": 54,
"popularity_score": 15,
"rank": 0,
"discounted_price": 710,
"labs": [
{
"lab_id": 395,
"lab_name": "Prednalytics Laboratory",
"lab_rating": 34
},
{
"lab_id": 363,
"lab_name": "Neuberg Diagnostics",
"lab_rating": 408
}
]
},
{
"city": "mumbai",
"after_tat": 168,
"before_tat": 54,
"popularity_score": 15,
"rank": 0,
"discounted_price": 710,
"labs": [
{
"lab_id": 395,
"lab_name": "Prednalytics Laboratory",
"lab_rating": 34
},
{
"lab_id": 380,
"lab_name": "Neuberg Diagnostics",
"lab_rating": 408
}
]
}
]
}
}

我想知道在班加罗尔的每个实验室中进行了多少次测试。
我面临的问题是:
如果使用嵌套聚合按lab_id进行分组,那么它会按每个实验室进行分组,无论在哪个城市。

假设我的文档中只有一条记录,那么我期望的答案是像这样的城市班加罗尔

[
    {key: 395, doc_count: 1},
    {key: 363, doc_count: 1}
]

注意:每个城市中的实验室ID可能会重复。

英文:

My ES document structure is like this:

{
"_index": "my_index",
"_type": "_doc",
"_id": "1296",
"_version": 1,
"_seq_no": 431,
"_primary_term": 1,
"_routing": "1296",
"found": true,
"_source": {
	"id": 1296,
	"test_name": "abc"
	"test_id": 513
	"inventory_arr"[
		{
			"city": "bangalore",
			"after_tat": 168,
			"before_tat": 54,
			"popularity_score": 15,
			"rank": 0,
			"discounted_price": 710,
			"labs": [
				{
					"lab_id": 395,
					"lab_name": "Prednalytics Laboratory",
					"lab_rating": 34,
				},
				{
					"lab_id": 363,
					"lab_name": "Neuberg Diagnostics",
					"lab_rating": 408,
				}
			]
		},
		{
			"city": "mumbai",
			"after_tat": 168,
			"before_tat": 54,
			"popularity_score": 15,
			"rank": 0,
			"discounted_price": 710,
			"labs": [
				{
					"lab_id": 395,
					"lab_name": "Prednalytics Laboratory",
					"lab_rating": 34,
				},
				{
					"lab_id": 380,
					"lab_name": "Neuberg Diagnostics",
					"lab_rating": 408,
				}
			]
		}
	]
}

}

I want to know how many tests are performed in each lab that is in Bangalore.
The problem I'm facing that:
If grouping by lab_id using nested aggregation than it group by each lab no matter in which city it is.

Suppose there is only one record in my doc then I'm expecting answer like this for city Bangalore

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-html -->

[
{key: 395, doc_count: 1}
{key: 363, doc_count: 1}
]

<!-- end snippet -->

Note: lab id can be duplicated in each city.

答案1

得分: 1

这个问题可以使用过滤聚合(filter aggregation)来解决。

当你使用嵌套聚合时,你是在嵌套文档上进行迭代。过滤聚合会过滤掉那些不符合你在内部提供的过滤查询条件的嵌套文档。在你的情况下,你希望过滤掉不在班加罗尔市内的嵌套文档。在移除了这些嵌套文档之后,你可以在lab_id字段上再使用另一个词桶(terms bucket)聚合。

祝你好运!

英文:

This problem can be solved using a filter aggregation.

When you are using a nested aggregation, you are iterating over the nested documents. The filter aggregation, filters out the nested documents that don't match the filter query that you provide inside. In your case you would want to filter out the nested documents that aren't inside the city of Bangalore. After you have removed those nested documents you can use another terms bucket aggregation on the lab_id.

Good luck!

huangapple
  • 本文由 发表于 2023年2月16日 16:30:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75469575.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定