2023年7月27日 23:41:57go评论99阅读模式

英文:

Elasticsearch: Reindexing doubled the size of an index

问题

我刚刚从以前索引的一个转储中进行了完整的重新索引，但新创建的索引在索引所有文档之前就是以前索引的两倍大小。可能的原因是什么？

以前的索引大小为3.7GB，新索引为7GB。

更新：现在已经减小到5.2GB（可能是由于段合并），但如您所见，它仍然比之前的索引大，之前的索引大小为3.7GB。

以下是两个索引的分片输出：

英文:

I just did a full reindex from a dump of a previous index but the newly created index is double the size of a previous one even before it indexed all the documents.
What could be the reason?

The previous index was 3.7gb and the new is 7gb.

Update: It has now come down to 5.2gb (probably due to segments merge) but as you can see it is still larger than the previous index which is 3.7gb

Here's the shards output for both the indices:

答案1

得分: 1

旧索引和新索引大小差异的原因是因为未分配的分片。

GET _cat/shards/index_name_1,index_name_2?v

上述API调用显示了一个小索引中存在一些未分配的分片。未分配的分片会影响store.size。store.size是所有分片大小的总和。如果分片未分配，它将不会被计算。

对于大索引，pri.store.sizes和store.size具有不同的大小。这意味着大索引的一个副本已被分配，而小索引的2个副本仍未分配。

您可以使用以下API调用来检查分片为何未分配。

GET _cluster/allocation/explain

Elasticsearch将尝试5次分配这些分片。如果尝试5次后仍然失败，将没有自动分配这些分片的过程。您可以使用以下API调用来强制分配这些分片。

POST _cluster/reroute?retry_failed=true

请注意，如果您遇到磁盘水印问题，例如磁盘空间不足，分配过程将再次失败。您可以通过删除旧索引或删除旧的Elasticsearch日志等方式腾出更多磁盘空间。

英文:

The reason for the differences between old and new index sizes is because of the unassigned shards.

GET _cat/shards/index_name_1,index_name_2?v

The above API call shows that there are some unassigned shards for a small index. Unassigned shards are affecting the store.size. The store.size is the sum of all shards sizes. If shards are unassigned it won't be calculated.

The pri.store.sizes and store.size have different sizes for the big index. This means one of the replicas of the big index is allocated and 2 replicas of the small index remain unassigned.

You can check why the shards are unassigned with the following API call.

GET _cluster/allocation/explain

Elasticsearch will retry 5 times to allocate the shards. If it's failed 5 times there won't be any automatic process to allocate those shards. You can force to allocate the shards with the following API call.

POST _cluster/reroute?retry_failed=true

Please note that, if you are struggling with disk watermark, e.g insufficient disk space, the allocation process will be failed again. You can have more disk space by removing the old indices or removing the old Elasticsearch logs etc.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Elasticsearch：重新索引使索引大小翻倍

问题

答案1

ElasticSearch：如何筛选和将具有数组字段的索引更改为布尔字段？

从具有匹配模式的每个索引中提取最后索引的文档。

在ElasticSearch版本7中替换InternalSimpleValue构造函数

Elasticsearch内部命中响应

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论