Reducing GC time and other cluster performance issues

I currently run an 11 node (5 of which are data nodes) cluster, consisting of about 58,000 shards consisting of 5TBs. We are running Elasticsearch 1.7.1 ATM. We are experiencing a long amount of Garbage Collecting times (40-60 seconds) on a lot of our data nodes. This is causing certain API functions, like searching, to timeout, and for the cluster to fall into a funky state. We encountered this a few months back and solved this by scaling from 3 data nodes to 5. Right now, scaling out additional data nodes ends up becoming a hard process (takes a while to get the hardware). So I have a few questions:

  1. Is there a way to decrease GC time, outside of scaling data nodes?
  2. Is there a way to capture something like 'Last GC Time Per Node' from Elasticsearch's API? Right now we throw everything from /_nodes/stats into Grafana, but the only GC stats are 'Total time spent GC' and 'GC Count', neither of which tells us anything. It would be nice to have an API call to check it, so we can be alerted when it crosses a threshold. The only place I can find it is log files.

That is a VERY large number of shards for a cluster that size. If I have calculated it correctly you have 11600 shards per data node and the average shard size is only around 90Mb. As each shard is a Lucene index and carries with it a certain amount of overhead in terms of file handles and memory, I would suspect this to be behind at least part of your problem. I would recommend consolidating indices in order to bring down the number of shards to a manageable level.

We are working to use curator to setup data retention policies, which will in turn reduce the shard sizes. The main issue is that we use Logstash for all of our data, and by default we create an index per day per 'platform' that logs data into Elasticsearch. This can't be easily changed or collapsed with our current setup.

As the average shard size is so small I would recommend changing the index template to only create one shard per index unless this is likely to make them larger that 50GB in size (if you have not already done so). In most logging use cases I come across we aim to have shards that are between a few and tens of Gigabytes in size.

The main issue with that I have different customers; some have kilobytes per shard, some have 20GBs per shard. We procedurally generate these indexes ATM, so it is very difficult to set these manually. Also, correct me if I am wrong, but I can't change the number of primary shards per index after allocation, right? So that means that the current indexes will still contribute to this issue.

Right now I have it set at 5 primary shards + 2 replicas per shard per index. Should I reduce it to 2 or 3 per index, or go all the way to 1?

I would consider going all the way down to 1 for all but the largest indices to limit the number of shards going forward. If volumes are reasonably steady, you could consider the previous days shard size when generating the new ones. Although it is not possible to change the shard count for existing indices without reindexing, changing to just 1 replica can be done at any point and would quickly reduce the shard count across the board.

Whats the disadvantage of decreasing the number of shards per index?

Hi,

If you have 2 replicas, I suggest decreasing it down to 1. For every shard, some amount of memory needs to be set aside to make the shard searchable. This is an easy way to reduce the amount of memory used by lucene segments. The way to reduce GC time is to reduce the memory pressure. You should check to see where most of the memory is being allocated to but I'd recommend doing things like the following in your logstash template:

  1. Disable _all & set index.query.default_field to "message" in your logstash template
  2. Enable doc values on any field which is not an analyzed string. This will reduce any memory needed for field data
  3. Make sure all your index names start with logstash-* or specify a template for each index that matches your index pattern. The logstash template does things like omit norms which will reduce the amount of memory required for your indexes but only applies this to indexes that start with logstash-*.
  4. In place of deleting indexes, you can close them to reduce memory. They will still exist on the file system but not be searchable and won't use any memory. You can re-open them at anytime.

I would monitor the memory usage on your data nodes to see where its going by using:

GET /_nodes/stats?human

Look at segments & fielddata as a starting point. However as previously suggested you have too many shards per node, and reducing this will help.