We switched to a self-managed AWS-based Elasticsearch cluster in november last year, using fairly large instances with nvme-based disks. All in all we were shocked by the performance. Searches were instant.
Now 6 months later, performance of the same cluster is far from what it was - our ingestion rate has grown a bit but not shockingly so. We shift old data (the stuff we can't delete for various reasons) to "cold nodes" and only keep "new" indices on hot nodes. This means that the amount of data on the hot nodes has been fairly stable during the last 6 months.
When poking around in the "monitoring" pane in Kibana, I'm noticing a high search-rate against "cold" indices aswell. I guess I'd like to understand how Elasticsearch "filters out" indices to direct the query to, so to speak (my theory right now is that cold nodes get the search requests aswell, and since these are storing more and more data, the net result is that searches grow slower and slower. I don't know how smart Elasticsearch is in this regard. Most of our searches are log-type searches in Kibana based on date, and our index names are "date-formatted".
Memory-wise all hot nodes have a nice sawtooth-pattern for their GC, and (just looking at the graphs in Kibana) seem to be GCin around every 8 minutes.
I guess I'm trying to understand how performance can degrade when the volume of data into/searches against the cluster hasn't changed super-drastically. The only "change" I can think of is the ever-increasing size of data on cold nodes (99% of searches don't hit data in cold nodes).
Any pointers appreciated.