在Elasticsearch索引中有超过20亿个文档。

huangapple go评论51阅读模式
英文:

Having more than 2B docs in an Elasticsearch index

问题

我有一个拥有一个索引的Elasticsearch集群。在某个时刻,当我尝试插入新文档时,出现了以下错误:

索引中的文档数量不能超过2147483519

在网上搜索后,我了解到将索引分割成多个主分片应该有助于解决问题。我将索引分割成了一个新的索引,其中包含3个主分片(而不是之前的1个),但我仍然在尝试写入新分割的索引时遇到了错误。我是否误解了这个问题?是否有什么我可以做来继续将更多文档添加到索引中?
谢谢!

英文:

I have an elasticsearch cluster with one index. At some point I started getting the following error when inserting new documents:
> number of documents in the index cannot exceed 2147483519

After searching online I saw that splitting the index into several primary shards should help. I split the index into a new index with 3 shards (instead of the previous 1) but I'm still getting the error writing to the newly split index.
Did I misunderstand this? Is there anything I can do to be able to continue adding more documents to the index?

Thanks!

答案1

得分: 1

增加主分片数量在理论上应该有所帮助;您能确认您没有尝试写入相同的已用尽分片(例如,您是否采用了某种路由或分配策略,或者使用了某些连接类型)?

不管怎样,我建议您将索引拆分为多个索引,而不是增加分片数量。这取决于您的具体用例。例如,

  • 如果您正在处理仅追加数据,绝对可以采用滚动索引方法
  • 如果您不处理仅追加数据,尝试根据某个字段值将索引拆分为一个特定维度。
英文:

In theory, increasing the number of primary shards should have helped; can you confirm you're not trying to write into the same exhausted shard (e.g., you have some routing or allocation strategy in place, or some join types)?

Anyhow, I would recommend you split your index into multiple indices rather than increasing the number of shards. It depends anyhow on your specific use case. E.g.,

  • if you're dealing with append-only data, DEFINITELY go for a rolling index approach
  • If you're not dealing with append-only data, try to split the index across one particular dimension, e.g., based on a field value.

huangapple
  • 本文由 发表于 2023年3月4日 02:28:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/75630662.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定