My ES 7.8.0 cluster contains 5 nodes with 120 GB heap memory and 32 CPU in it. Also, I have pretty large index with 6.6kk docs and, according to _cat/indices/ API, with store.size 59.8gb and pri.store.size 30.1gb. Number of primary shards is 3 and replicas is 1. Is it optimal number for shards? I read that every shard should contain ~ 30GB of data
In the same time every node of my cluster is master, data and ingest - is it ok, or I must configure it like 3 master-nodes and 2 data-nodes? Currently, I'm ok with search speed and availability of the cluster. In addition to that periodically I'm catching warning in log file about
took [17.6s], which is over [10s], to compute cluster state update for [put-mapping
I think it is strange due to quite good resources that I gave to my cluster.
It looks like you either have very large or constantly expanding mappings, which is causing updates to be slow. How many fields does your index have? Have you overridden any of the default settings?
Are you you using parent-child or nested mappings?
That does not sound like a good setting value and will most likely cause problems. No wonder mapping changes result in slow cluster state updates as these are performed in a single thread.
I can think of no easy fix, so suspect you may either need to live with this or reconsider how you handle mappings to avoid this.
Your problem seems to be with the mappings and not necessarily with shard size or distribution. Have never seen a use case with even close to that level of mapped fields so have no advice to give. This is unchartered territory as far as I know.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.