英文:
Having more than 2B docs in an Elasticsearch index
问题
我有一个拥有一个索引的Elasticsearch集群。在某个时刻,当我尝试插入新文档时,出现了以下错误:
索引中的文档数量不能超过2147483519
在网上搜索后,我了解到将索引分割成多个主分片应该有助于解决问题。我将索引分割成了一个新的索引,其中包含3个主分片(而不是之前的1个),但我仍然在尝试写入新分割的索引时遇到了错误。我是否误解了这个问题?是否有什么我可以做来继续将更多文档添加到索引中?
谢谢!
英文:
I have an elasticsearch cluster with one index. At some point I started getting the following error when inserting new documents:
> number of documents in the index cannot exceed 2147483519
After searching online I saw that splitting the index into several primary shards should help. I split the index into a new index with 3 shards (instead of the previous 1) but I'm still getting the error writing to the newly split index.
Did I misunderstand this? Is there anything I can do to be able to continue adding more documents to the index?
Thanks!
答案1
得分: 1
增加主分片数量在理论上应该有所帮助;您能确认您没有尝试写入相同的已用尽分片(例如,您是否采用了某种路由或分配策略,或者使用了某些连接类型)?
不管怎样,我建议您将索引拆分为多个索引,而不是增加分片数量。这取决于您的具体用例。例如,
- 如果您正在处理仅追加数据,绝对可以采用滚动索引方法
- 如果您不处理仅追加数据,尝试根据某个字段值将索引拆分为一个特定维度。
英文:
In theory, increasing the number of primary shards should have helped; can you confirm you're not trying to write into the same exhausted shard (e.g., you have some routing or allocation strategy in place, or some join types)?
Anyhow, I would recommend you split your index into multiple indices rather than increasing the number of shards. It depends anyhow on your specific use case. E.g.,
- if you're dealing with append-only data, DEFINITELY go for a rolling index approach
- If you're not dealing with append-only data, try to split the index across one particular dimension, e.g., based on a field value.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论