英文:
Improving Elasticsearch indexing performance
问题
我正在尝试提高我的ES(版本8.3.3)索引性能。
我有一个2节点集群(位于2台物理服务器上,每台服务器都有96个CPU和755GB的RAM),我已将分片数量设置为4,副本数量为0,并将索引设置为在40GB时滚动(每30分钟左右滚动一次)。有2个filebeat实例将数据推送到同一个ES。
我在想,无论我在ES集群中有多少节点,我的所有数据都只能写入一个索引。这是分片数量有助于并行写入索引的地方吗?
我已经看到将分片数量从2增加到4会略微改善,最高的索引速率从约33K/s增加到约41K/s,但在我的filebeat日志中仍然报告有数据包丢失。
与此同时,我已经读到,拥有太多的分片会降低搜索性能。
我的问题是,对于一个40GB大小的索引,我应该设置多少个分片,以不太影响搜索性能?
英文:
I'm trying to improve my ES (version 8.3.3) indexing performance.
I have a 2-node cluster (on 2 physical servers, each with 96 CPUs and 755GB RAM), and I have set the number of shards to 4, replicas to 0, and the index to rollover at 40GB (it'll rollover every 30 minutes or so). There are 2 filebeat instances pushing data to the same ES.
I was wondering that no matter how many nodes I have in my ES cluster, all my data can only be written to one index. Is this where the number of shards would help to parallelize the writing to the index?
I have seen a slight improvement by increasing the number of shards from 2 to 4, where the highest indexing rate seen increased from ~33K/s to ~41K/s, but there is still packet drop being reported in my filebeat logs.
At the same time, I've read that having too many shards would decrease search performance.
My question is, for an index size of 40GB, how many shards should I set, without too much degradation in the search performance?
答案1
得分: 1
> 我在想,无论我在我的ES集群中有多少节点,我的所有数据都只能写入一个索引。这是不是分片的数量会帮助并行写入到索引中?
是的。主分片的数量将有助于并行处理过程,特别是索引过程。但当然,它受限于硬件本身。Elasticsearch 的速度取决于您的硬件。所以如果你还没有的话,可以考虑切换到SSD硬盘。此外,RAID类型也会对性能产生影响。
> 我的问题是,对于一个40GB的索引大小,我应该设置多少分片,以避免搜索性能过度降低?
在最佳实践中,建议将分片大小保持在10-50GB之间。分片大小已经在这个范围内,您可以查看以下建议。
建议:
- 在两台物理服务器上建立一个虚拟机,将最大内存设置为62GB,堆设置为31GB。每台物理服务器将有12个节点。这样性能会更好。
- 至少有3个专用的主节点,每个节点内存为4GB。
- 拥有专用的协调节点,并将所有HTTP流量通过它们发送。
- 将 refresh_interval 设置为一个较大的数字以加快索引速度。
一些有用的文章:
针对索引速度进行调整。
针对搜索速度进行调整。
英文:
> I was wondering that no matter how many nodes I have in my ES cluster, all my data can only be written to one index. Is this where the number of shards would help to parallelize the writing to the index?
Yes. The primary shard counts will help you to parallelize the process, especially the indexing. But of course, it's limited to the hardware itself. Elasticsearch can be as fast as your hardware. So maybe you can switch to SSD disks if you don't have already. Also the RAID type effect the performance to much.
> My question is, for an index size of 40GB, how many shards should I set, without too much degradation in the search performance?
In the best practice, it's recommended to keep a shard size between 10-50GB. The shard sizes are already in that range, you can check the following recommendations.
Recommendation:
- Build a VM on 2 physical servers, set max RAM to 62GB of RAM, and
set heap to 31GB. You will have 12 nodes per physical server. It
will perform better. - Have dedicated master nodes at least 3 with 4GB of RAM.
- Have dedicated coordinator nodes and send all HTTP traffic through
them. - Set refresh_interval to a higher number to speed up the indexing.
Some useful articles:
Tune for indexing speed.
Tune for search speed.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论