Optimising Elasticsearch for timeseries data with high search query QPS

Hey,

We are using ElasticSearch 2.2.0 with an expected search QPS of 80K with an expected p99 of 180ms.

Current cluster is of 40 nodes with 12 cores, 24 GB memory (12 GB heap size). Data is 150K documents, one shard with 39 replicas. The data is a timeseries data where we periodically update the 150K docs with new values, but the same ids. All the docs have a TTL attached to them with 30 mins being the minimum TTL, and 9 hours being the max TTL.

The issue is that ES starts degrading on the latency as time goes on, so far we figured out that it is because of an increase in the number of segments. Force merging the segments down to 1 will cause improvements, but only for the next 30-40 mins before the no.of segments goes up to 12-15 per node, and ~600 on the cluster.

Is there any configuration that will help with a setup like this? Our refresh interval is set to 30s, and can be increased up to 5 mins.

This is a very unusual scenario, but if you are fine serving somewhat stale data, which it seems given that you are willing to increase the refresh interval, you could perhaps try something like this:

  • Create a master index (basically the current one) that you continously update and set this to have only one replica.
  • Then create an alias which you will query through. This will point to a special read index that you will periodically create.
  • Every X minutes you reindex the current content of the master index into a new read index with 0 or 1 replica configured). Once reindexing has completed you forcemerge this down to a single segment and once this has completed you increase the number of replicas to 39. Once each node has a replica you point the read alias to this new index and delete the old read index.

This will allow you to constantly query a forcemerged index while handling updates in the background. The frequency you can refresh with will depend on how long the reindexing and forcemerge process takes. This may also allow you to reduce the refresh interval on the master index.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.