I a currently using 3 primary shards and 2 replica shards. How reducing number of primary shards affects search and what’s the impact on cluster?
Check the link
What is driving the desire to change? What is the problem you are looking to solve? What is the size of the shards in the index?
We currently have 3 primary shards and 2 replica shards. The shards for the live product search index are all less than 2.5GB. The staging indices have shards that are considerably smaller: less than 150MB. The live suggestion search indices are all around 1GB, and their staging counterparts are less than 120MB.
We are having frequent circuit breaker exceptions and bulk request failures during indexing. So, we are doing this to free some resources in the cluster.
Which version of Elasticsearch are you using?
What is causing the circuit breaker issues?
What is the size and specification of the cluster?
What type and mix of load is the cluster under?
How many indices do in total have in the cluster? What is the total data volume?
The questions Christian is asking are all valid. In all likelihood, shards that are less than 5-10gb for real workloads on real production clusters are most likely resource wastage.
Live suggestion indices are memory heavy and are sometimes an anomaly to this rule - but then it really matters how often you update them and what are the resources you have available to maintain them.
If you want a quick definitive answer you may want to connect your cluster to Pulse (https://pulse.support/), it should be able to automatically give you a good idea of what’s going on.
I don't know the forum policy on ads. And I have no opinion on your product, never heard of or used it, and it can be helpful to suggest tools that might be useful to @Elastic04 and others. But, if you are going to suggest your own company's products/tools, then at least be open and state very clearly your direct involvement / interest when making the suggestion please. You made 3 posts today, 2 are promoting your own company's products!
We are currently on 8.9 but upgrading to 9.x in next month.
We have no idea what is causing circuit breaker at the moment but this is the error log:
{"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [indices:admin/aliases/get] would be [7561944652/7gb], which is larger than the limit of [7558764953/7gb], real usage: [7561925776/7gb], new bytes reserved: [18876/18.4kb], usages [model_inference=0/0b, inflight_requests=89216/87.1kb, request=846238/826.4kb, fielddata=381693/372.7kb, eql_sequence=0/0b]","bytes_wanted":7561944652,"bytes_limit":7558764953,"durability":"TRANSIENT"}],"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [indices:admin/aliases/get] would be [7561944652/7gb], which is larger than the limit of [7558764953/7gb], real usage: [7561925776/7gb], new bytes reserved: [18876/18.4kb], usages [model_inference=0/0b, inflight_requests=89216/87.1kb, request=846238/826.4kb, fielddata=381693/372.7kb, eql_sequence=0/0b]","bytes_wanted":7561944652,"bytes_limit":7558764953,"durability":"TRANSIENT"},"status":429}
Hot nodes: 3.8 vCPU / 15GB RAM / 120GB storage. Three nodes are deployed over 3 availability zones.
171 indices and number of shards per node: ~211. Data volume currently - 137.4 GB
Check the topic
That does sound like a lot of small indices and shards. I Would recommend merging indices that are related and have these have 1 primary shard only. Aim for an average shard size of between 5GB and 20GB. This will likely reduce the overhead, but I am not sure by how much or whether it will be sufficient, as it will depend on the load the cluster is under and the type and quantity of features used.
What does the load profile look like? Are you primarily running queries or are you also performing inserts and updates? What is the ratio?
As you are in a virtualised environment it would also make sense to ensure that your storage is not a bottleneck, e.g. by running iostat -x on the nodes while they are under load.
We are mainly using the indices for our search application. Indexing is done through a nightly job and new index is created everyday. We don’t do inserts and updates as of now but we plan on moving to real time indexing in the near future. We are using Elastic cloud hosted.
