ElasticSearch将文档按字段聚合成数组

huangapple go评论88阅读模式
英文:

ElasticSearch aggregate documents by field into array

问题

我正在尝试将以下逻辑作为Java查询写入Elasticsearch:

ES包含以下文档:

{"request": 1, "store": "ebay", "status": "retrieved", "lastdate": "2012/12/20 17:00", "retrieved_by": "John"}
{"request": 1, "store": "ebay", "status": "stored", "lastdate": "2012/12/20 18:00", "stored_by": "Alex"}
{"request": 1, "store": "ebay", "status": "bought", "lastdate": "2012/12/20 19:00", "bought_by": "Arik"}
{"request": 2, "store": "aliexpress", "status": "retrieved", "lastdate": "2012/12/20 17:00"}
{"request": 2, "store": "aliexpress", "status": "stored", "lastdate": "2012/12/20 18:00"}
{"request": 2, "store": "aliexpress", "status": "bought", "lastdate": "2012/12/20 19:00"}

我正在尝试编写一个查询,该查询将店铺名称作为输入并按其请求ID将请求聚合到数组中。

换句话说,我正在尝试:

  1. 根据特定字段("store")进行term过滤。

  2. 根据特定字段("request")将结果聚合到一个数组中。

例如,对于输入"ebay":

{
    "1": [
        {"request": 1, "store": "ebay", "status": "retrieved", "lastdate": "2012/12/20 17:00", "retrieved_by": "John"},
        {"request": 1, "store": "ebay", "status": "stored", "lastdate": "2012/12/20 18:00", "stored_by": "Alex"},
        {"request": 1, "store": "ebay", "status": "bought", "lastdate": "2012/12/20 19:00", "bought_by": "Arik"}
    ],

    // 其他请求的数组...
}

结果中的键不是很重要(我将采购任何键)。重要的部分是我根据请求字段将所有记录聚合到数组中,并按lastdate在数组中排序。

我的最终目标是使用Java QueryBuilder 创建此查询。因此,我首先尝试使用Elasticsearch原生查询语言以了解要使用哪个QueryBuilder。

英文:

I'm trying to write the following logic as a query in java to elasticsearch :

ES contains following documents :

{"request" : 1, "store":"ebay", "status" : "retrieved" , "lastdate": "2012/12/20 17:00", "retrieved_by" : "John"}
{"request" : 1, "store":"ebay", "status" : "stored" , "lastdate": "2012/12/20 18:00", "stored_by" : "Alex"}
{"request" : 1, "store":"ebay", "status" : "bought" , "lastdate": "2012/12/20 19:00", "bought_by" : "Arik"}
{"request" : 2, "store":"aliexpress", "status" : "retrieved" , "lastdate": "2012/12/20 17:00"}
{"request" : 2, "store":"aliexpress","status" : "stored" , "lastdate": "2012/12/20 18:00"}
{"request" : 2, "store":"aliexpress","status" : "bought" , "lastdate": "2012/12/20 19:00"}

I'm trying to write a query that will get as an input the store name and return the requests of that store aggregated into an array by their request_id.

In other words I'm trying :

1.Filter by term on specific field("store").

2.Aggregate the results based on specific field("request") to an array

For example for input "ebay" :

{
"1" : [
{"request" : 1, "store":"ebay", "status" : "retrieved" , "lastdate": "2012/12/20 17:00", "retrieved_by" : "John"}
{"request" : 1, "store":"ebay", "status" : "stored" , "lastdate": "2012/12/20 18:00", "stored_by" : "Alex"}
{"request" : 1, "store":"ebay", "status" : "bought" , "lastdate": "2012/12/20 19:00", "bought_by" : "Arik"}
],

".." : [...]

}

It isnt so important the the key in the result will be the request (I will buy any key). The important part is that I aggregate all the records by the request field into an array and sorted them in the array by lastdate.

My end goal is to create this query with java QueryBuilder. Therefore, I'm first trying to use elastic native query language in order to understand what QueryBuilder to use..

答案1

得分: 1

设置一个基本映射:

PUT stores
{
  "mappings": {
    "properties": {
      "lastdate": {
        "type": "date",
        "format": "yyyy/MM/dd HH:mm"
      }
    }
  }
}

同步一些文档:

POST _bulk
{"index":{"_index":"stores","_type":"_doc"}}
{"request":1,"store":"ebay","status":"retrieved","lastdate":"2012/12/20 17:00","retrieved_by":"John"}
{"index":{"_index":"stores","_type":"_doc"}}
{"request":1,"store":"ebay","status":"stored","lastdate":"2012/12/20 18:00","stored_by":"Alex"}
{"index":{"_index":"stores","_type":"_doc"}}
{"request":1,"store":"ebay","status":"bought","lastdate":"2012/12/20 19:00","bought_by":"Arik"}
{"index":{"_index":"stores","_type":"_doc"}}
{"request":2,"store":"aliexpress","status":"retrieved","lastdate":"2012/12/20 17:00"}
{"index":{"_index":"stores","_type":"_doc"}}
{"request":2,"store":"aliexpress","status":"stored","lastdate":"2012/12/20 18:00"}
{"index":{"_index":"stores","_type":"_doc"}}
{"request":2,"store":"aliexpress","status":"bought","lastdate":"2012/12/20 19:00"}

在查询中过滤,然后按 request 字段聚合并使用排序的 top_hits

GET stores/_search
{
  "size": 0, 
  "query": {
    "term": {
      "store": {
        "value": "ebay"
      }
    }
  },
  "aggs": {
    "by_req": {
      "terms": {
        "field": "request"
      },
      "aggs": {
        "hits": {
          "top_hits": {
            "sort": [
              {
                "lastdate": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

将这些内容翻译成Java DSL不应该太困难。

英文:

Set up an elementary mapping:

PUT stores
{
  "mappings": {
    "properties": {
      "lastdate": {
        "type": "date",
        "format": "yyyy/MM/dd HH:mm"
      }
    }
  }
}

Sync a few docs:

POST _bulk
{"index":{"_index":"stores","_type":"_doc"}}
{"request":1,"store":"ebay","status":"retrieved","lastdate":"2012/12/20 17:00","retrieved_by":"John"}
{"index":{"_index":"stores","_type":"_doc"}}
{"request":1,"store":"ebay","status":"stored","lastdate":"2012/12/20 18:00","stored_by":"Alex"}
{"index":{"_index":"stores","_type":"_doc"}}
{"request":1,"store":"ebay","status":"bought","lastdate":"2012/12/20 19:00","bought_by":"Arik"}
{"index":{"_index":"stores","_type":"_doc"}}
{"request":2,"store":"aliexpress","status":"retrieved","lastdate":"2012/12/20 17:00"}
{"index":{"_index":"stores","_type":"_doc"}}
{"request":2,"store":"aliexpress","status":"stored","lastdate":"2012/12/20 18:00"}
{"index":{"_index":"stores","_type":"_doc"}}
{"request":2,"store":"aliexpress","status":"bought","lastdate":"2012/12/20 19:00"}

Filter in the query, then aggregate by the request field and use sorted top_hits:

GET stores/_search
{
  "size": 0, 
  "query": {
    "term": {
      "store": {
        "value": "ebay"
      }
    }
  },
  "aggs": {
    "by_req": {
      "terms": {
        "field": "request"
      },
      "aggs": {
        "hits": {
          "top_hits": {
            "sort": [
              {
                "lastdate": {
                  "order": "desc"
                }
              }
            ]
          }
        }
      }
    }
  }
}

Translating this into the Java DSL shouldn't be too difficult.

答案2

得分: 0

@joe 在ES DSL中发布了正确的答案(再次感谢!)。

我的目标是在Java中使用该查询。如果有人也需要JAVA DSL代码,我在这里添加它:

QueryBuilder storeQuery = QueryBuilders.boolQuery().filter(QueryBuilders.termsQuery("store", "ebay"));
AggregationBuilder subAgg = AggregationBuilders.topHits("hits").sort("lastdate", SortOrder.ASC);
AggregationBuilder mainAgg = AggregationBuilders.terms("by_req").field("request").subAggregation(subAgg);
英文:

@joe posted the right answer in ES DSL(Thanks again !).

My goal was to use the query in java. In case someone will also need the JAVA DSL code I'm adding it here :

QueryBuilder storeQuery = QueryBuilders.boolQuery().filter(QueryBuilders.termsQuery("store", "ebay"))
AggregationBuilder subAgg= AggregationBuilders.topHits("hits").sort("lastdate, SortOrder.ASC);
 AggregationBuilder mainAgg= AggregationBuilders.terms("by_req").field("request").subAggregation(subAgg);

huangapple
  • 本文由 发表于 2020年7月30日 16:07:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/63168832.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定