2023年7月18日 04:27:28go评论67阅读模式

英文:

In Elasticsearch, how can I aggregate data from nested fields and their parent document?

问题

以下是您要翻译的内容：

给定具有以下映射的 sales 索引：

{
  "mappings": {
    "properties": {
      "amount": {
        "type": "float"
      },
      "created_at": {
        "type": "date",
        "format": "date_time||epoch_millis"
      },
      "events": {
        "type": "nested",
        "properties": {
          "created_at": {
            "type": "date",
            "format": "date_time||epoch_millis"
          },
          "fees": {
            "properties": {
              "amount": {
                "type": "float"
              },
              "credit_debit": {
                "type": "keyword"
              }
            }
          }
        }
      },
      "id": {
        "type": "keyword"
      },
      "status": {
        "type": "keyword"
      },
      "type": {
        "type": "keyword"
      }
    }
  }
}

我的问题是，如何查询以下内容？

对于每个 sales.id
…在特定范围内创建的 created_at
显示：
- sales.id
- sales.amount
- 最大（即最新的）sales.events.created_at
- 总计 sales.events.fees.amount

我的最终目标是拥有一个CSV文件，其中包含结果。任何解决方案都可以，包括：

创建一个新的索引并重新索引以进行一些计算
高级聚合查询
Kibana SQL 查询
Kibana 可视化
其他方法

英文:

Give a sales index with this mapping:

{
  &quot;mappings&quot;: {
    &quot;properties&quot;: {
      &quot;amount&quot;: {
        &quot;type&quot;: &quot;float&quot;
      },
      &quot;created_at&quot;: {
        &quot;type&quot;: &quot;date&quot;,
        &quot;format&quot;: &quot;date_time||epoch_millis&quot;
      },
      &quot;events&quot;: {
        &quot;type&quot;: &quot;nested&quot;,
        &quot;properties&quot;: {
          &quot;created_at&quot;: {
            &quot;type&quot;: &quot;date&quot;,
            &quot;format&quot;: &quot;date_time||epoch_millis&quot;
          },
          &quot;fees&quot;: {
            &quot;properties&quot;: {
              &quot;amount&quot;: {
                &quot;type&quot;: &quot;float&quot;
              },
              &quot;credit_debit&quot;: {
                &quot;type&quot;: &quot;keyword&quot;
              }
            }
          }
        }
      },
      &quot;id&quot;: {
        &quot;type&quot;: &quot;keyword&quot;
      },
      &quot;status&quot;: {
        &quot;type&quot;: &quot;keyword&quot;
      },
      &quot;type&quot;: {
        &quot;type&quot;: &quot;keyword&quot;
      }
    }
  }
}

My question is, how can I query for the following?

for each sales.id
…that were created_at in a specific range
show:
- the sales.id
- the sales.amount
- max (i.e. latest) sales.events.created_at
- total sales.events.fees.amount

My end goal is to have a CSV file with the results. Any solution would work, including:

create a new index and reindex with some calculations
an advanced aggregation query
a Kibana SQL query
a Kibana visualisation
something else

答案1

得分: 2

以下是翻译好的代码部分：

GET /sales/_search?filter_path=**.key,**.amount,**.created_at,**.total_fees.value,**.latest.value
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "created_at": {
              "gte": "2023-04-01T00:00:00.000+02:00",
              "lte": "2023-07-01T00:00:00.000+02:00"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "pages": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "id": {
              "terms": {
                "field": "id"
              }
            }
          }
        ]
      },
      "aggs": {
        "fields": {
          "top_hits": {
            "size": 1,
            "_source": [
              "amount",
              "created_at"
            ]
          }
        },
        "events": {
          "nested": {
            "path": "events"
          },
          "aggs": {
            "latest": {
              "max": {
                "field": "events.created_at"
              }
            },
            "total_fees": {
              "sum": {
                "field": "events.fees.amount"
              }
            }
          }
        }
      }
    }
  }
}

GET /sales/_search?filter_path=**.key,**.amount,**.created_at,**.total_fees.value,**.latest.value
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "created_at": {
              "gte": "2023-04-01T00:00:00.000+02:00",
              "lte": "2023-07-01T00:00:00.000+02:00"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "pages": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "id": {
              "terms": {
                "field": "id"
              }
            }
          }
        ],
        "after": {"id": "xyz"}
      },
      "aggs": {
        "fields": {
          "top_hits": {
            "size": 1,
            "_source": [
              "amount",
              "created_at"
            ]
          }
        },
        "events": {
          "nested": {
            "path": "events"
          },
          "aggs": {
            "latest": {
              "max": {
                "field": "events.created_at"
              }
            },
            "total_fees": {
              "sum": {
                "field": "events.fees.amount"
              }
            }
          }
        }
      }
    }
  }
}

jq -r '.aggregations.pages.buckets[] | [.key.id, .fields.hits.hits[]."_source".amount, .fields.hits.hits[]."_source".created_at, .events.total_fees.value, .events.latest.value] | @csv' input.json

希望这对你有帮助。

英文:

You can use the following query in order to extract the information you need:

GET /sales/_search?filter_path=**.key,**.amount,**.created_at,**.total_fees.value,**.latest.value
{
  &quot;size&quot;: 0,
  &quot;query&quot;: {
    &quot;bool&quot;: {
      &quot;filter&quot;: [
        {
          &quot;range&quot;: {
            &quot;created_at&quot;: {
              &quot;gte&quot;: &quot;2023-04-01T00:00:00.000+02:00&quot;,
              &quot;lte&quot;: &quot;2023-07-01T00:00:00.000+02:00&quot;
            }
          }
        }
      ]
    }
  },
  &quot;aggs&quot;: {
    &quot;pages&quot;: {
      &quot;composite&quot;: {
        &quot;size&quot;: 1000,
        &quot;sources&quot;: [
          {
            &quot;id&quot;: {
              &quot;terms&quot;: {
                &quot;field&quot;: &quot;id&quot;
              }
            }
          }
        ]
      },
      &quot;aggs&quot;: {
        &quot;fields&quot;: {
          &quot;top_hits&quot;: {
            &quot;size&quot;: 1,
            &quot;_source&quot;: [
              &quot;amount&quot;,
              &quot;created_at&quot;
            ]
          }
        },
        &quot;events&quot;: {
          &quot;nested&quot;: {
            &quot;path&quot;: &quot;events&quot;
          },
          &quot;aggs&quot;: {
            &quot;latest&quot;: {
              &quot;max&quot;: {
                &quot;field&quot;: &quot;events.created_at&quot;
              }
            },
            &quot;total_fees&quot;: {
              &quot;sum&quot;: {
                &quot;field&quot;: &quot;events.fees.amount&quot;
              }
            }
          }
        }
      }
    }
  }
}

If you need to paginate to the next page because there are more than 1000 buckets, you can do so by using the same query and adding the after parameter and specifying the id of the very last bucket of the preceding page:

GET /sales/_search?filter_path=**.key,**.amount,**.created_at,**.total_fees.value,**.latest.value
{
  &quot;size&quot;: 0,
  &quot;query&quot;: {
    &quot;bool&quot;: {
      &quot;filter&quot;: [
        {
          &quot;range&quot;: {
            &quot;created_at&quot;: {
              &quot;gte&quot;: &quot;2023-04-01T00:00:00.000+02:00&quot;,
              &quot;lte&quot;: &quot;2023-07-01T00:00:00.000+02:00&quot;
            }
          }
        }
      ]
    }
  },
  &quot;aggs&quot;: {
    &quot;pages&quot;: {
      &quot;composite&quot;: {
        &quot;size&quot;: 1000,
        &quot;sources&quot;: [
          {
            &quot;id&quot;: {
              &quot;terms&quot;: {
                &quot;field&quot;: &quot;id&quot;
              }
            }
          }
        ],
        &quot;after&quot;: {&quot;id&quot;: &quot;xyz&quot;}
      },
      &quot;aggs&quot;: {
        &quot;fields&quot;: {
          &quot;top_hits&quot;: {
            &quot;size&quot;: 1,
            &quot;_source&quot;: [
              &quot;amount&quot;,
              &quot;created_at&quot;
            ]
          }
        },
        &quot;events&quot;: {
          &quot;nested&quot;: {
            &quot;path&quot;: &quot;events&quot;
          },
          &quot;aggs&quot;: {
            &quot;latest&quot;: {
              &quot;max&quot;: {
                &quot;field&quot;: &quot;events.created_at&quot;
              }
            },
            &quot;total_fees&quot;: {
              &quot;sum&quot;: {
                &quot;field&quot;: &quot;events.fees.amount&quot;
              }
            }
          }
        }
      }
    }
  }
}

Then you can export the results to CSV using the following jq command:

jq -r &#39;.aggregations.pages.buckets[] | [.key.id, .fields.hits.hits[].&quot;_source&quot;.amount, .fields.hits.hits[].&quot;_source&quot;.created_at, .events.total_fees.value, .events.latest.value] | @csv&#39; input.json

You'll get something like this:

&quot;056c65ec-22f6-4da1-9bce-82c12ed845cd&quot;,&quot;5.90&quot;,1681211194150,0.3499999940395355,1681289446844

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Elasticsearch中，我如何聚合来自嵌套字段及其父文档的数据？

问题

答案1

Improving Elasticsearch indexing performance.

如何获取`dynamic_date_format`以及其他映射Elasticsearch设置的值。

在数组内进行Elasticsearch的update_by_query操作

安装Elasticsearch的knn插件时出现错误。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论