英文:
What is the ideal structure for storing and querying nested data in elasticsearch?
问题
我在Elasticsearch上运行一个使用function_score的查询,其中一个嵌套字段的值用于计算(在这种情况下是价格)。以下哪种方式更好地索引数据?
{
"name": "John",
"age": 50,
"country": "US",
"subscription": {
"Plan1": {
"price": 100,
"date": "June 5th"
},
"Plan2": {
"price": 50,
"date": "June 6th"
}
}
}
或者
{
"name": "John",
"age": 50,
"country": "US",
"subscription": [
{
"name": "Plan1",
"price": 100,
"date": "June 5th"
},
{
"name": "Plan2",
"price": 50,
"date": "June 6th"
}
]
}
查询将在"计划名称"和"价格"上进行过滤,"价格"将用于评分计算。计划的数量可能会超过20个。
编辑1:方法2的示例查询
{
"query": {
"function_score": {
"query": {
"bool": {
"filter": [
{
"range": {
"createdatutc": {
"gte": "2022-11-01T00:00:00.000Z",
"lt": "2023-05-06T00:00:00.000Z",
"format": "strict_date_optional_time"
}
}
},
{
"terms": {
"country": [
"US"
]
}
},
{
"term": {
"subscription.name": {
"value": "Plan1"
}
}
}
]
}
},
"functions": [
{
"filter": {
"query_string": {
"default_field": "name",
"query": "\"john\""
}
},
"script_score": {
"script": {
"lang": "painless",
"source": "for (item in params._source.subscription) {if (item.name == 'Plan1') {return item.price}}"
}
}
}
],
"score_mode": "sum",
"boost_mode": "replace"
}
}
}
英文:
I'm running a query on elasticsearch using function_score where the value of a nested field is used for the calculation (price in this case). Which of the following is a better way to index the data?
{
"name": "John",
"age": 50,
"country": "US",
"subscription": {
"Plan1": {
"price": 100,
"date": "June 5th"
},
"Plan2": {
"price": 50,
"date": "June 6th"
}
}
}
OR
{
"name": "John",
"age": 50,
"country": "US",
"subscription": [
{
"name": "Plan1",
"price": 100,
"date": "June 5th"
},
{
"name": "Plan2"
"price": 50,
"date": "June 6th"
}
]
}
The query would be filtering on the "plan name" and "price", and "price" will be used for score calculation. The number of plans maybe upwards of 20.
Edit 1: Sample query for approach 2
{
"query": {
"function_score": {
"query": {
"bool": {
"filter": [
{
"range": {
"createdatutc": {
"gte": "2022-11-01T00:00:00.000Z",
"lt": "2023-05-06T00:00:00.000Z",
"format": "strict_date_optional_time"
}
}
},
{
"terms": {
"country": [
"US"
]
}
},
{
"term": {
"subscription.name": {
"value": "Plan1"
}
}
}
]
}
},
"functions": [
{
"filter": {
"query_string": {
"default_field": "name",
"query": "\"john\""
}
},
"script_score": {
"script": {
"lang": "painless",
"source": "for (item in params._source.subscription) {if (item.name == 'Plan1') {return item.price}}"
}
}
}
],
"score_mode": "sum",
"boost_mode": "replace"
}
}
}
答案1
得分: 1
根据你有多少个计划而定。如果只有两个或者少数几个,那么第一选项更好;否则,你需要将 subscription
设计为嵌套对象,但嵌套对象在查询性能方面不够优化。
使用第一选项,可以通过单一条件 subscription.Plan1.price: 100
进行计划名称和价格的筛选,而使用第二选项则需要两个条件(因此 subscription
需要是 nested
),一个是 subscription.name: Plan1
,另一个是 subscription.price: 100
。
更新 1:使用选项 1
{
"query": {
"function_score": {
"query": {
"bool": {
"filter": [
{
"range": {
"createdatutc": {
"gte": "2022-11-01T00:00:00.000Z",
"lt": "2023-05-06T00:00:00.000Z",
"format": "strict_date_optional_time"
}
}
},
{
"terms": {
"country": [
"US"
]
}
},
{
"exists": {
"field": "subscription.Plan1.price"
}
}
]
}
},
"functions": [
{
"filter": {
"query_string": {
"default_field": "name",
"query": "\"john\""
}
},
"field_value_factor": {
"field": "subscription.Plan1.price",
"factor": 1.2
}
}
],
"score_mode": "sum",
"boost_mode": "replace"
}
}
}
英文:
It depends on how many plans you have. If it's just two or a handful then the first option is better, otherwise you need to make subscription
a nested object and nested object are less optimal in terms of query performance.
With the first option, filtering on the plan name and price can be done with a single condition on subscription.Plan1.price: 100
, while with the second option, you need two conditions (hence why subscription
needs to be nested
), one on subscription.name: Plan1
and another on subscription.price: 100
UPDATE 1: using option 1
{
"query": {
"function_score": {
"query": {
"bool": {
"filter": [
{
"range": {
"createdatutc": {
"gte": "2022-11-01T00:00:00.000Z",
"lt": "2023-05-06T00:00:00.000Z",
"format": "strict_date_optional_time"
}
}
},
{
"terms": {
"country": [
"US"
]
}
},
{
"exists": {
"field": "subscription.Plan1.price"
}
}
]
}
},
"functions": [
{
"filter": {
"query_string": {
"default_field": "name",
"query": "\"john\""
}
},
"field_value_factor": {
"field": "subscription.Plan1.price",
"factor": 1.2
}
}
],
"score_mode": "sum",
"boost_mode": "replace"
}
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论