2023年7月3日 12:43:06go评论103阅读模式

英文:

Get pagination query documents in an aggregated bucket

问题

I am working on the GeoTile of Elastic search. After grouping the locations into buckets, I want to get the data in that bucket with pagination (using search after). Have anyone done on that, how can I achieve it? Thank you!

Here is the GeoTile aggregation I have used:

GET /index-name/_doc/_search
{
  "aggs": {
     "result": {
        "geotile_grid": {
          "field": "location",
          "precision": 12
        }
     }
   }
}

And the result look like:

{
  "took": 3,
  "hits": {
    "total": {
      "value": 39,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [ ... ]
  },
  "aggregations": {
    "result": {
      "buckets": [
        {
          "key": "12/3519/1597",
          "doc_count": 36
        },
        {
          "key": "12/3520/1597",
          "doc_count": 3
        }
      ]
    }
  }
}

For example, how can I get 36 documents in the "12/3519/1597" bucket?
Thank you!

I have already tried to convert between the GeoTile key "12/3519/1597" into a bounding box follow this article or used the GeoTileUtils from the ESearch code.

However, from the example above, the key "12/3519/1597" is converted to a bounding box, and when I query all the documents in that box, there were 2 buckets. The "x=3520" bucket contains documents in the "lon=129.375" which exactly lie on the "right edge".

英文:

Here is the GeoTile aggregation I have used:

GET /index-name/_doc/_search
{
  &quot;aggs&quot;: {
     &quot;result&quot;: {
        &quot;geotile_grid&quot;: {
          &quot;field&quot;: &quot;location&quot;,
          &quot;precision&quot;: 12
        }
     }
   }
}

And the result look like:

{
  &quot;took&quot; : 3,
  &quot;hits&quot; : {
    &quot;total&quot; : {
      &quot;value&quot; : 39,
      &quot;relation&quot; : &quot;eq&quot;
    },
    &quot;max_score&quot; : null,
    &quot;hits&quot; : [ ... ]
  },
  &quot;aggregations&quot; : {
    &quot;result&quot; : {
      &quot;buckets&quot; : [
        {
          &quot;key&quot; : &quot;12/3519/1597&quot;,
          &quot;doc_count&quot; : 36
        },
        {
          &quot;key&quot; : &quot;12/3520/1597&quot;,
          &quot;doc_count&quot; : 3
        }
      ]
    }
  }
}

For example, how can I get 36 documents in the "12/3519/1597" bucket?
Thank you!

I have already tried to convert between the GeoTile key "12/3519/1597" into a bounding box follow this article or used the GeoTileUtils from the ESearch code.

However, from the example above, the key "12/3519/1597" is converted to a bounding box, and when I query all the documents in that box, there were 2 buckets. The x=3520 bucket contains documents in the lon=129.375 which exactly lie on the right edge.

答案1

得分: 1

以下是您要翻译的内容：

"You could nest top hits aggregation to get documents per geo tile buckets.

You could also use geo grid query to filter documents per tile.

GET kibana_sample_data_logs/_search { "size": 1, "query": { "bool": { "must": [], "filter": [ { "geo_grid": { "geo.coordinates": { "geotile": "5/9/12" } } } ], "should": [], "must_not": [] } } }

Response { "took": 0, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 675, "relation": "eq" }, "max_score": 0, "hits": [ { "_index": ".ds-kibana_sample_data_logs-2023.07.12-000001", "_id": "NM-ISokB7DQkCI7yJZQ-", "_score": 0, "_source": { "agent": "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1", "bytes": 8973, "clientip": "213.50.214.248", "extension": "rpm", "geo": { "srcdest": "US:VN", "src": "US", "dest": "VN", "coordinates": { "lat": 40.19349528, "lon": -76.76340361 } }, "host": "artifacts.elastic.co", "index": "kibana_sample_data_logs", "ip": "213.50.214.248", "machine": { "ram": 12884901888, "os": "win 8" }, "memory": null, "message": "213.50.214.248 - - [2018-09-10T11:39:18.812Z] \"GET /beats/metricbeat/metricbeat-6.3.2-i686.rpm HTTP/1.1\" 200 8973 \"-\" \"Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1\"", "phpmemory": null, "referer": "http://www.elastic-elastic-elastic.com/success/daniel-tani", "request": "/beats/metricbeat/metricbeat-6.3.2-i686.rpm", "response": 200, "tags": [ "success", "info" ], "@timestamp": "2023-08-21T11:39:18.812Z", "url": "https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm", "utc_time": "2023-08-21T11:39:18.812Z", "event": { "dataset": "sample_web_logs" }, "bytes_gauge": 8973, "bytes_counter": 65621715 } } ] } }"

英文:

You could nest top hits aggregation to get documents per geo tile buckets.

You could also use geo grid query to filter documents per tile.

GET kibana_sample_data_logs/_search
{
  &quot;size&quot;: 1,
  &quot;query&quot;: {
    &quot;bool&quot;: {
      &quot;must&quot;: [],
      &quot;filter&quot;: [
        {
          &quot;geo_grid&quot;: {
            &quot;geo.coordinates&quot;: {
              &quot;geotile&quot;: &quot;5/9/12&quot;
            }
          }
        }
      ],
      &quot;should&quot;: [],
      &quot;must_not&quot;: []
    }
  }
}

Response

{
  &quot;took&quot;: 0,
  &quot;timed_out&quot;: false,
  &quot;_shards&quot;: {
    &quot;total&quot;: 1,
    &quot;successful&quot;: 1,
    &quot;skipped&quot;: 0,
    &quot;failed&quot;: 0
  },
  &quot;hits&quot;: {
    &quot;total&quot;: {
      &quot;value&quot;: 675,
      &quot;relation&quot;: &quot;eq&quot;
    },
    &quot;max_score&quot;: 0,
    &quot;hits&quot;: [
      {
        &quot;_index&quot;: &quot;.ds-kibana_sample_data_logs-2023.07.12-000001&quot;,
        &quot;_id&quot;: &quot;NM-ISokB7DQkCI7yJZQ-&quot;,
        &quot;_score&quot;: 0,
        &quot;_source&quot;: {
          &quot;agent&quot;: &quot;Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1&quot;,
          &quot;bytes&quot;: 8973,
          &quot;clientip&quot;: &quot;213.50.214.248&quot;,
          &quot;extension&quot;: &quot;rpm&quot;,
          &quot;geo&quot;: {
            &quot;srcdest&quot;: &quot;US:VN&quot;,
            &quot;src&quot;: &quot;US&quot;,
            &quot;dest&quot;: &quot;VN&quot;,
            &quot;coordinates&quot;: {
              &quot;lat&quot;: 40.19349528,
              &quot;lon&quot;: -76.76340361
            }
          },
          &quot;host&quot;: &quot;artifacts.elastic.co&quot;,
          &quot;index&quot;: &quot;kibana_sample_data_logs&quot;,
          &quot;ip&quot;: &quot;213.50.214.248&quot;,
          &quot;machine&quot;: {
            &quot;ram&quot;: 12884901888,
            &quot;os&quot;: &quot;win 8&quot;
          },
          &quot;memory&quot;: null,
          &quot;message&quot;: &quot;213.50.214.248 - - [2018-09-10T11:39:18.812Z] \&quot;GET /beats/metricbeat/metricbeat-6.3.2-i686.rpm HTTP/1.1\&quot; 200 8973 \&quot;-\&quot; \&quot;Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1\&quot;&quot;,
          &quot;phpmemory&quot;: null,
          &quot;referer&quot;: &quot;http://www.elastic-elastic-elastic.com/success/daniel-tani&quot;,
          &quot;request&quot;: &quot;/beats/metricbeat/metricbeat-6.3.2-i686.rpm&quot;,
          &quot;response&quot;: 200,
          &quot;tags&quot;: [
            &quot;success&quot;,
            &quot;info&quot;
          ],
          &quot;@timestamp&quot;: &quot;2023-08-21T11:39:18.812Z&quot;,
          &quot;url&quot;: &quot;https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm&quot;,
          &quot;utc_time&quot;: &quot;2023-08-21T11:39:18.812Z&quot;,
          &quot;event&quot;: {
            &quot;dataset&quot;: &quot;sample_web_logs&quot;
          },
          &quot;bytes_gauge&quot;: 8973,
          &quot;bytes_counter&quot;: 65621715
        }
      }
    ]
  }
}

答案2

得分: 0

对于更新的 ES 版本（自 8.8 起），您可以使用 @Nathan Reese 的解决方案。

然而，在较低版本（我的是 7.10），我使用了 Elastic search 的 GeoTileUtils 来将 geotile 键（z/x/y）转换为边界框。

但您必须注意边界框的边缘。geotile 聚合不包括右边和底边上的位置（点）。为了排除边缘上的点，我使用了如下的 painless 脚本：

GET /index-name/_doc/_search
{
  "size": 3,
  "query": {
    "bool": {
      "filter": [
        { 
          "geo_bounding_box": {
            "location": {
              "top_left": {
                "lat": 36.80928470205938, "lon": 129.287109375
                },
              "bottom_right": {
                "lat": 36.73888412439431, "lon": 129.37500
              }
            }
          }
        },
        {
          "script": {
            "script": {
              "source": "doc['location'].lon < params.maxLon && doc['location'].lat < params.minLat",
              "lang": "painless",
              "params": {
                "minLat": 36.80928470205938,
                "maxLon": 129.37500
              }
            }
          }
        }
      ]
    }
  }
}

英文:

For the newer ES version (since 8.8), you can use @Nathan Reese solution.

However, in the lower version (mine is 7.10), I have used GeoTileUtils of the Elastic search to convert from the geotile key (z/x/y) into the bounding box.

But you must aware of the edge of bounding box. The geotile aggregation does not take the location (point) on the right and bottom edge. To exclude the point on the edge, I used a painless script as follow:

GET /index-name/_doc/_search
{
  &quot;size&quot;: 3,
  &quot;query&quot;: {
    &quot;bool&quot;: {
      &quot;filter&quot;: [
        { 
          &quot;geo_bounding_box&quot;: {
            &quot;location&quot;: {
              &quot;top_left&quot;: {
                &quot;lat&quot;: 36.80928470205938, &quot;lon&quot;: 129.287109375
                },
              &quot;bottom_right&quot;: {
                &quot;lat&quot;: 36.73888412439431, &quot;lon&quot;: 129.37500
              }
            }
          }
        },
        {
          &quot;script&quot;: {
            &quot;script&quot;: {
              &quot;source&quot;: &quot;doc[&#39;location&#39;].lon &lt; params.maxLon &amp;&amp; doc[&#39;location&#39;].lat &lt; params.minLat&quot;,
              &quot;lang&quot;: &quot;painless&quot;,
              &quot;params&quot;: {
                &quot;minLat&quot;: 36.80928470205938,
                &quot;maxLon&quot;: 129.37500
              }
            }
          }
        }
      ]
    }
  }
}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在一个聚合的桶中获取分页查询文档。

问题

答案1

答案2

在Elasticsearch中存储和查询嵌套数据的理想结构是什么？

将UTM坐标转换为度坐标

在Elasticsearch 7.17.9中创建具有多个排序字段的索引。

Typescript误解与ElasticSearch Node客户端

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。