Im looking for a solution to store data for years and ES is prefered for logging indexing searching etc.
However i setup a nice cluster but the search is beaing slow, thinking on autoscaling this when the resource is being used by 90%.
I am not sure what splunk does. There are a couple of specifics to autoscaling Elasticsearch:
Triggering on the right metric: CPU usage might be one of them, but ES could also be IO or network bound. Using the queue sizes might be a better choice.
Scaling down requires orchestration. Since each data node carries data, when one node is shut down, the data need to be recovered elsewhere. Vacating the node first is ideal, since it maintains data availability copies.
But before going that far, I think it is advisable to:
Look into performance of some of the searches to see if there is anything that can be improved to them or the way data is indexed.
Manually increase the cluster size to get an idea of how big it needs to be and that it actually cures the search performance issues you are seeing.
Er, are your indices currently spread over all the nodes data in your cluster?
And what did you mean specifically with "autoscaling this when the resource is being used by 90%"? What/which resource, and how are you measuring? Is your search IO or CPU or network bound, and if you know how did you find out? And if you don't know, that's first thing you need to find out!!
Also, where would you scale out to? Are you using cloud provider like Elastic Cloud, AWS, etc? Scaling costs more $$$, make sure you realise the economics. if you are using your own DC/hardware, more hardware also costs $$$.
In my personal experience the easiest, lowest hanging fruit is to do the analysis as suggested by Hemming.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.