在Elasticsearch中存储和查询嵌套数据的理想结构是什么?

huangapple go评论94阅读模式
英文:

What is the ideal structure for storing and querying nested data in elasticsearch?

问题

我在Elasticsearch上运行一个使用function_score的查询,其中一个嵌套字段的值用于计算(在这种情况下是价格)。以下哪种方式更好地索引数据?

  1. {
  2. "name": "John",
  3. "age": 50,
  4. "country": "US",
  5. "subscription": {
  6. "Plan1": {
  7. "price": 100,
  8. "date": "June 5th"
  9. },
  10. "Plan2": {
  11. "price": 50,
  12. "date": "June 6th"
  13. }
  14. }
  15. }

或者

  1. {
  2. "name": "John",
  3. "age": 50,
  4. "country": "US",
  5. "subscription": [
  6. {
  7. "name": "Plan1",
  8. "price": 100,
  9. "date": "June 5th"
  10. },
  11. {
  12. "name": "Plan2",
  13. "price": 50,
  14. "date": "June 6th"
  15. }
  16. ]
  17. }

查询将在"计划名称"和"价格"上进行过滤,"价格"将用于评分计算。计划的数量可能会超过20个。

编辑1:方法2的示例查询

  1. {
  2. "query": {
  3. "function_score": {
  4. "query": {
  5. "bool": {
  6. "filter": [
  7. {
  8. "range": {
  9. "createdatutc": {
  10. "gte": "2022-11-01T00:00:00.000Z",
  11. "lt": "2023-05-06T00:00:00.000Z",
  12. "format": "strict_date_optional_time"
  13. }
  14. }
  15. },
  16. {
  17. "terms": {
  18. "country": [
  19. "US"
  20. ]
  21. }
  22. },
  23. {
  24. "term": {
  25. "subscription.name": {
  26. "value": "Plan1"
  27. }
  28. }
  29. }
  30. ]
  31. }
  32. },
  33. "functions": [
  34. {
  35. "filter": {
  36. "query_string": {
  37. "default_field": "name",
  38. "query": "\"john\""
  39. }
  40. },
  41. "script_score": {
  42. "script": {
  43. "lang": "painless",
  44. "source": "for (item in params._source.subscription) {if (item.name == 'Plan1') {return item.price}}"
  45. }
  46. }
  47. }
  48. ],
  49. "score_mode": "sum",
  50. "boost_mode": "replace"
  51. }
  52. }
  53. }
英文:

I'm running a query on elasticsearch using function_score where the value of a nested field is used for the calculation (price in this case). Which of the following is a better way to index the data?

  1. {
  2. "name": "John",
  3. "age": 50,
  4. "country": "US",
  5. "subscription": {
  6. "Plan1": {
  7. "price": 100,
  8. "date": "June 5th"
  9. },
  10. "Plan2": {
  11. "price": 50,
  12. "date": "June 6th"
  13. }
  14. }
  15. }
  16. OR
  17. {
  18. "name": "John",
  19. "age": 50,
  20. "country": "US",
  21. "subscription": [
  22. {
  23. "name": "Plan1",
  24. "price": 100,
  25. "date": "June 5th"
  26. },
  27. {
  28. "name": "Plan2"
  29. "price": 50,
  30. "date": "June 6th"
  31. }
  32. ]
  33. }

The query would be filtering on the "plan name" and "price", and "price" will be used for score calculation. The number of plans maybe upwards of 20.

Edit 1: Sample query for approach 2

  1. {
  2. "query": {
  3. "function_score": {
  4. "query": {
  5. "bool": {
  6. "filter": [
  7. {
  8. "range": {
  9. "createdatutc": {
  10. "gte": "2022-11-01T00:00:00.000Z",
  11. "lt": "2023-05-06T00:00:00.000Z",
  12. "format": "strict_date_optional_time"
  13. }
  14. }
  15. },
  16. {
  17. "terms": {
  18. "country": [
  19. "US"
  20. ]
  21. }
  22. },
  23. {
  24. "term": {
  25. "subscription.name": {
  26. "value": "Plan1"
  27. }
  28. }
  29. }
  30. ]
  31. }
  32. },
  33. "functions": [
  34. {
  35. "filter": {
  36. "query_string": {
  37. "default_field": "name",
  38. "query": "\"john\""
  39. }
  40. },
  41. "script_score": {
  42. "script": {
  43. "lang": "painless",
  44. "source": "for (item in params._source.subscription) {if (item.name == 'Plan1') {return item.price}}"
  45. }
  46. }
  47. }
  48. ],
  49. "score_mode": "sum",
  50. "boost_mode": "replace"
  51. }
  52. }
  53. }

答案1

得分: 1

根据你有多少个计划而定。如果只有两个或者少数几个,那么第一选项更好;否则,你需要将 subscription 设计为嵌套对象,但嵌套对象在查询性能方面不够优化。

使用第一选项,可以通过单一条件 subscription.Plan1.price: 100 进行计划名称和价格的筛选,而使用第二选项则需要两个条件(因此 subscription 需要是 nested),一个是 subscription.name: Plan1,另一个是 subscription.price: 100

更新 1:使用选项 1

  1. {
  2. "query": {
  3. "function_score": {
  4. "query": {
  5. "bool": {
  6. "filter": [
  7. {
  8. "range": {
  9. "createdatutc": {
  10. "gte": "2022-11-01T00:00:00.000Z",
  11. "lt": "2023-05-06T00:00:00.000Z",
  12. "format": "strict_date_optional_time"
  13. }
  14. }
  15. },
  16. {
  17. "terms": {
  18. "country": [
  19. "US"
  20. ]
  21. }
  22. },
  23. {
  24. "exists": {
  25. "field": "subscription.Plan1.price"
  26. }
  27. }
  28. ]
  29. }
  30. },
  31. "functions": [
  32. {
  33. "filter": {
  34. "query_string": {
  35. "default_field": "name",
  36. "query": "\"john\""
  37. }
  38. },
  39. "field_value_factor": {
  40. "field": "subscription.Plan1.price",
  41. "factor": 1.2
  42. }
  43. }
  44. ],
  45. "score_mode": "sum",
  46. "boost_mode": "replace"
  47. }
  48. }
  49. }
英文:

It depends on how many plans you have. If it's just two or a handful then the first option is better, otherwise you need to make subscription a nested object and nested object are less optimal in terms of query performance.

With the first option, filtering on the plan name and price can be done with a single condition on subscription.Plan1.price: 100, while with the second option, you need two conditions (hence why subscription needs to be nested), one on subscription.name: Plan1 and another on subscription.price: 100

UPDATE 1: using option 1

  1. {
  2. "query": {
  3. "function_score": {
  4. "query": {
  5. "bool": {
  6. "filter": [
  7. {
  8. "range": {
  9. "createdatutc": {
  10. "gte": "2022-11-01T00:00:00.000Z",
  11. "lt": "2023-05-06T00:00:00.000Z",
  12. "format": "strict_date_optional_time"
  13. }
  14. }
  15. },
  16. {
  17. "terms": {
  18. "country": [
  19. "US"
  20. ]
  21. }
  22. },
  23. {
  24. "exists": {
  25. "field": "subscription.Plan1.price"
  26. }
  27. }
  28. ]
  29. }
  30. },
  31. "functions": [
  32. {
  33. "filter": {
  34. "query_string": {
  35. "default_field": "name",
  36. "query": "\"john\""
  37. }
  38. },
  39. "field_value_factor": {
  40. "field": "subscription.Plan1.price",
  41. "factor": 1.2
  42. }
  43. }
  44. ],
  45. "score_mode": "sum",
  46. "boost_mode": "replace"
  47. }
  48. }
  49. }

huangapple
  • 本文由 发表于 2023年6月26日 15:31:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76554446.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定