在一个聚合的桶中获取分页查询文档。

huangapple go评论79阅读模式
英文:

Get pagination query documents in an aggregated bucket

问题

I am working on the GeoTile of Elastic search. After grouping the locations into buckets, I want to get the data in that bucket with pagination (using search after). Have anyone done on that, how can I achieve it? Thank you!

Here is the GeoTile aggregation I have used:

GET /index-name/_doc/_search
{
  "aggs": {
     "result": {
        "geotile_grid": {
          "field": "location",
          "precision": 12
        }
     }
   }
}

And the result look like:

{
  "took": 3,
  "hits": {
    "total": {
      "value": 39,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [ ... ]
  },
  "aggregations": {
    "result": {
      "buckets": [
        {
          "key": "12/3519/1597",
          "doc_count": 36
        },
        {
          "key": "12/3520/1597",
          "doc_count": 3
        }
      ]
    }
  }
}

For example, how can I get 36 documents in the "12/3519/1597" bucket?
Thank you!

I have already tried to convert between the GeoTile key "12/3519/1597" into a bounding box follow this article or used the GeoTileUtils from the ESearch code.

However, from the example above, the key "12/3519/1597" is converted to a bounding box, and when I query all the documents in that box, there were 2 buckets. The "x=3520" bucket contains documents in the "lon=129.375" which exactly lie on the "right edge".

英文:

I am working on the GeoTile of Elastic search. After grouping the locations into buckets, I want to get the data in that bucket with pagination (using search after). Have anyone done on that, how can I achieve it? Thank you!

Here is the GeoTile aggregation I have used:

GET /index-name/_doc/_search
{
  "aggs": {
     "result": {
        "geotile_grid": {
          "field": "location",
          "precision": 12
        }
     }
   }
}

And the result look like:

{
  "took" : 3,
  "hits" : {
    "total" : {
      "value" : 39,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ... ]
  },
  "aggregations" : {
    "result" : {
      "buckets" : [
        {
          "key" : "12/3519/1597",
          "doc_count" : 36
        },
        {
          "key" : "12/3520/1597",
          "doc_count" : 3
        }
      ]
    }
  }
}

For example, how can I get 36 documents in the "12/3519/1597" bucket?
Thank you!

I have already tried to convert between the GeoTile key "12/3519/1597" into a bounding box follow this article or used the GeoTileUtils from the ESearch code.

However, from the example above, the key "12/3519/1597" is converted to a bounding box, and when I query all the documents in that box, there were 2 buckets. The x=3520 bucket contains documents in the lon=129.375 which exactly lie on the right edge.

答案1

得分: 1

以下是您要翻译的内容:

"You could nest top hits aggregation to get documents per geo tile buckets.

You could also use geo grid query to filter documents per tile.

GET kibana_sample_data_logs/_search { "size": 1, "query": { "bool": { "must": [], "filter": [ { "geo_grid": { "geo.coordinates": { "geotile": "5/9/12" } } } ], "should": [], "must_not": [] } } }

Response { "took": 0, "timed_out": false, "_shards": { "total": 1, "successful": 1, "skipped": 0, "failed": 0 }, "hits": { "total": { "value": 675, "relation": "eq" }, "max_score": 0, "hits": [ { "_index": ".ds-kibana_sample_data_logs-2023.07.12-000001", "_id": "NM-ISokB7DQkCI7yJZQ-", "_score": 0, "_source": { "agent": "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1", "bytes": 8973, "clientip": "213.50.214.248", "extension": "rpm", "geo": { "srcdest": "US:VN", "src": "US", "dest": "VN", "coordinates": { "lat": 40.19349528, "lon": -76.76340361 } }, "host": "artifacts.elastic.co", "index": "kibana_sample_data_logs", "ip": "213.50.214.248", "machine": { "ram": 12884901888, "os": "win 8" }, "memory": null, "message": "213.50.214.248 - - [2018-09-10T11:39:18.812Z] \"GET /beats/metricbeat/metricbeat-6.3.2-i686.rpm HTTP/1.1\" 200 8973 \"-\" \"Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1\"", "phpmemory": null, "referer": "http://www.elastic-elastic-elastic.com/success/daniel-tani", "request": "/beats/metricbeat/metricbeat-6.3.2-i686.rpm", "response": 200, "tags": [ "success", "info" ], "@timestamp": "2023-08-21T11:39:18.812Z", "url": "https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm", "utc_time": "2023-08-21T11:39:18.812Z", "event": { "dataset": "sample_web_logs" }, "bytes_gauge": 8973, "bytes_counter": 65621715 } } ] } }"

英文:

You could nest top hits aggregation to get documents per geo tile buckets.

You could also use geo grid query to filter documents per tile.

GET kibana_sample_data_logs/_search
{
  "size": 1,
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "geo_grid": {
            "geo.coordinates": {
              "geotile": "5/9/12"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

Response

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 675,
      "relation": "eq"
    },
    "max_score": 0,
    "hits": [
      {
        "_index": ".ds-kibana_sample_data_logs-2023.07.12-000001",
        "_id": "NM-ISokB7DQkCI7yJZQ-",
        "_score": 0,
        "_source": {
          "agent": "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1",
          "bytes": 8973,
          "clientip": "213.50.214.248",
          "extension": "rpm",
          "geo": {
            "srcdest": "US:VN",
            "src": "US",
            "dest": "VN",
            "coordinates": {
              "lat": 40.19349528,
              "lon": -76.76340361
            }
          },
          "host": "artifacts.elastic.co",
          "index": "kibana_sample_data_logs",
          "ip": "213.50.214.248",
          "machine": {
            "ram": 12884901888,
            "os": "win 8"
          },
          "memory": null,
          "message": "213.50.214.248 - - [2018-09-10T11:39:18.812Z] \"GET /beats/metricbeat/metricbeat-6.3.2-i686.rpm HTTP/1.1\" 200 8973 \"-\" \"Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1\"",
          "phpmemory": null,
          "referer": "http://www.elastic-elastic-elastic.com/success/daniel-tani",
          "request": "/beats/metricbeat/metricbeat-6.3.2-i686.rpm",
          "response": 200,
          "tags": [
            "success",
            "info"
          ],
          "@timestamp": "2023-08-21T11:39:18.812Z",
          "url": "https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm",
          "utc_time": "2023-08-21T11:39:18.812Z",
          "event": {
            "dataset": "sample_web_logs"
          },
          "bytes_gauge": 8973,
          "bytes_counter": 65621715
        }
      }
    ]
  }
}

答案2

得分: 0

对于更新的 ES 版本(自 8.8 起),您可以使用 @Nathan Reese 的解决方案。

然而,在较低版本(我的是 7.10),我使用了 Elastic search 的 GeoTileUtils 来将 geotile 键(z/x/y)转换为边界框。

但您必须注意边界框的边缘。geotile 聚合不包括右边和底边上的位置(点)。为了排除边缘上的点,我使用了如下的 painless 脚本:

GET /index-name/_doc/_search
{
  "size": 3,
  "query": {
    "bool": {
      "filter": [
        { 
          "geo_bounding_box": {
            "location": {
              "top_left": {
                "lat": 36.80928470205938, "lon": 129.287109375
                },
              "bottom_right": {
                "lat": 36.73888412439431, "lon": 129.37500
              }
            }
          }
        },
        {
          "script": {
            "script": {
              "source": "doc['location'].lon < params.maxLon && doc['location'].lat < params.minLat",
              "lang": "painless",
              "params": {
                "minLat": 36.80928470205938,
                "maxLon": 129.37500
              }
            }
          }
        }
      ]
    }
  }
}
英文:

For the newer ES version (since 8.8), you can use @Nathan Reese solution.

However, in the lower version (mine is 7.10), I have used GeoTileUtils of the Elastic search to convert from the geotile key (z/x/y) into the bounding box.

But you must aware of the edge of bounding box. The geotile aggregation does not take the location (point) on the right and bottom edge. To exclude the point on the edge, I used a painless script as follow:

GET /index-name/_doc/_search
{
  &quot;size&quot;: 3,
  &quot;query&quot;: {
    &quot;bool&quot;: {
      &quot;filter&quot;: [
        { 
          &quot;geo_bounding_box&quot;: {
            &quot;location&quot;: {
              &quot;top_left&quot;: {
                &quot;lat&quot;: 36.80928470205938, &quot;lon&quot;: 129.287109375
                },
              &quot;bottom_right&quot;: {
                &quot;lat&quot;: 36.73888412439431, &quot;lon&quot;: 129.37500
              }
            }
          }
        },
        {
          &quot;script&quot;: {
            &quot;script&quot;: {
              &quot;source&quot;: &quot;doc[&#39;location&#39;].lon &lt; params.maxLon &amp;&amp; doc[&#39;location&#39;].lat &lt; params.minLat&quot;,
              &quot;lang&quot;: &quot;painless&quot;,
              &quot;params&quot;: {
                &quot;minLat&quot;: 36.80928470205938,
                &quot;maxLon&quot;: 129.37500
              }
            }
          }
        }
      ]
    }
  }
}

huangapple
  • 本文由 发表于 2023年7月3日 12:43:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/76601882.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定