We are using ElasticSearch 2.2.0 with an expected search QPS of 80K with an expected p99 of 180ms.
Current cluster is of 40 nodes with 12 cores, 24 GB memory (12 GB heap size). Data is 150K documents, one shard with 39 replicas. The data is a timeseries data where we periodically update the 150K docs with new values, but the same ids. All the docs have a TTL attached to them with 30 mins being the minimum TTL, and 9 hours being the max TTL.
The issue is that ES starts degrading on the latency as time goes on, so far we figured out that it is because of an increase in the number of segments. Force merging the segments down to 1 will cause improvements, but only for the next 30-40 mins before the no.of segments goes up to 12-15 per node, and ~600 on the cluster.
Is there any configuration that will help with a setup like this? Our refresh interval is set to 30s, and can be increased up to 5 mins.