Elasticsearch:重新索引使索引大小翻倍

huangapple go评论89阅读模式
英文:

Elasticsearch: Reindexing doubled the size of an index

问题

我刚刚从以前索引的一个转储中进行了完整的重新索引,但新创建的索引在索引所有文档之前就是以前索引的两倍大小。可能的原因是什么?

以前的索引大小为3.7GB,新索引为7GB。

更新:现在已经减小到5.2GB(可能是由于段合并),但如您所见,它仍然比之前的索引大,之前的索引大小为3.7GB。

以下是两个索引的分片输出:
Elasticsearch:重新索引使索引大小翻倍

英文:

I just did a full reindex from a dump of a previous index but the newly created index is double the size of a previous one even before it indexed all the documents.
What could be the reason?

The previous index was 3.7gb and the new is 7gb.

Update: It has now come down to 5.2gb (probably due to segments merge) but as you can see it is still larger than the previous index which is 3.7gb

Elasticsearch:重新索引使索引大小翻倍

Here's the shards output for both the indices:
Elasticsearch:重新索引使索引大小翻倍

答案1

得分: 1

旧索引和新索引大小差异的原因是因为未分配的分片

GET _cat/shards/index_name_1,index_name_2?v

上述API调用显示了一个小索引中存在一些未分配的分片。未分配的分片会影响store.sizestore.size是所有分片大小的总和。如果分片未分配,它将不会被计算。

对于大索引,pri.store.sizesstore.size具有不同的大小。这意味着大索引的一个副本已被分配,而小索引的2个副本仍未分配。

您可以使用以下API调用来检查分片为何未分配。

GET _cluster/allocation/explain

Elasticsearch将尝试5次分配这些分片。如果尝试5次后仍然失败,将没有自动分配这些分片的过程。您可以使用以下API调用来强制分配这些分片。

POST _cluster/reroute?retry_failed=true

请注意,如果您遇到磁盘水印问题,例如磁盘空间不足,分配过程将再次失败。您可以通过删除旧索引或删除旧的Elasticsearch日志等方式腾出更多磁盘空间。

英文:

The reason for the differences between old and new index sizes is because of the unassigned shards.

GET _cat/shards/index_name_1,index_name_2?v

The above API call shows that there are some unassigned shards for a small index. Unassigned shards are affecting the store.size. The store.size is the sum of all shards sizes. If shards are unassigned it won't be calculated.

The pri.store.sizes and store.size have different sizes for the big index. This means one of the replicas of the big index is allocated and 2 replicas of the small index remain unassigned.

You can check why the shards are unassigned with the following API call.

GET _cluster/allocation/explain

Elasticsearch will retry 5 times to allocate the shards. If it's failed 5 times there won't be any automatic process to allocate those shards. You can force to allocate the shards with the following API call.

POST _cluster/reroute?retry_failed=true

Please note that, if you are struggling with disk watermark, e.g insufficient disk space, the allocation process will be failed again. You can have more disk space by removing the old indices or removing the old Elasticsearch logs etc.

huangapple
  • 本文由 发表于 2023年7月27日 23:41:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76781413.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定