英文:
Elasticsearch: Reindexing doubled the size of an index
问题
我刚刚从以前索引的一个转储中进行了完整的重新索引,但新创建的索引在索引所有文档之前就是以前索引的两倍大小。可能的原因是什么?
以前的索引大小为3.7GB,新索引为7GB。
更新:现在已经减小到5.2GB(可能是由于段合并),但如您所见,它仍然比之前的索引大,之前的索引大小为3.7GB。
英文:
I just did a full reindex from a dump of a previous index but the newly created index is double the size of a previous one even before it indexed all the documents.
What could be the reason?
The previous index was 3.7gb and the new is 7gb.
Update: It has now come down to 5.2gb (probably due to segments merge) but as you can see it is still larger than the previous index which is 3.7gb
答案1
得分: 1
旧索引和新索引大小差异的原因是因为未分配的分片。
GET _cat/shards/index_name_1,index_name_2?v
上述API调用显示了一个小索引中存在一些未分配的分片。未分配的分片会影响store.size
。store.size
是所有分片大小的总和。如果分片未分配,它将不会被计算。
对于大索引,pri.store.sizes
和store.size
具有不同的大小。这意味着大索引的一个副本已被分配,而小索引的2个副本仍未分配。
您可以使用以下API调用来检查分片为何未分配。
GET _cluster/allocation/explain
Elasticsearch将尝试5次分配这些分片。如果尝试5次后仍然失败,将没有自动分配这些分片的过程。您可以使用以下API调用来强制分配这些分片。
POST _cluster/reroute?retry_failed=true
请注意,如果您遇到磁盘水印问题,例如磁盘空间不足,分配过程将再次失败。您可以通过删除旧索引或删除旧的Elasticsearch日志等方式腾出更多磁盘空间。
英文:
The reason for the differences between old and new index sizes is because of the unassigned shards.
GET _cat/shards/index_name_1,index_name_2?v
The above API call shows that there are some unassigned shards for a small index. Unassigned shards are affecting the store.size
. The store.size
is the sum of all shards sizes. If shards are unassigned it won't be calculated.
The pri.store.sizes
and store.size
have different sizes for the big index. This means one of the replicas of the big index is allocated and 2 replicas of the small index remain unassigned.
You can check why the shards are unassigned with the following API call.
GET _cluster/allocation/explain
Elasticsearch will retry 5 times to allocate the shards. If it's failed 5 times there won't be any automatic process to allocate those shards. You can force to allocate the shards with the following API call.
POST _cluster/reroute?retry_failed=true
Please note that, if you are struggling with disk watermark, e.g insufficient disk space, the allocation process will be failed again. You can have more disk space by removing the old indices or removing the old Elasticsearch logs etc.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论