Hi,
I am running a time series elasticsearch cluster (on top of AWS service).
Using template, I create a daily index. 5 shards, 2 replicas, on 10 nodes + 4 masters.
Once the number of clients went high up - all stopped working - CPU maximum went to 100%, while CPU average kept low (~40%).
My guess is that the main searches are done against the latest days, so it focus on the nodes that has latest data - while the rest stays idle.
My question is - what would be the right scale mechanism ?
I think that I should by default, in the template, give the maximum number of replicas (10), so latest data will have as many replicas as possible.
Once data become old - in couple of days - reduce the number of replicas to 2.
Is this sounds like a decent methodology ?
Any other recommendations ?
It is likely a lot more efficient to use bigger hardware (ssds, ram, CPU)
for the new indices and use forced allocation awareness to control indices
location.
I don't know how you'd do that with any cloud service.
Hi,
I am running a time series elasticsearch cluster (on top of AWS service).
Using template, I create a daily index. 5 shards, 2 replicas, on 10 nodes +
4 masters.
Once the number of clients went high up - all stopped working - CPU maximum
went to 100%, while CPU average kept low (~40%).
My guess is that the main searches are done against the latest days, so it
focus on the nodes that has latest data - while the rest stays idle.
My question is - what would be the right scale mechanism ?
I think that I should by default, in the template, give the maximum number
of replicas (10), so latest data will have as many replicas as possible.
Once data become old - in couple of days - reduce the number of replicas to
2.
Is this sounds like a decent methodology ?
Any other recommendations ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.