Assistance regarding optimizing an Elasticsearch cluster for analytics

Hey everyone, writing in to ask for advice regarding scaling an Elasticsearch cluster being used for analytics. That is - aggregations only, no searches. The TL;DR is we're seeing high CPU usage during a cardinality aggregation query and are trying to scale it.

The cluster currently consists of 5 m3.2xlarge instances; the spec per instance is as such:

  • 8 CPUs
  • 30 GB of RAM (-> 15GB are allocated for the JVM heap)
  • an EBS volume

Regarding dataset sizes, an index per week is created, with 6 primary shards and 1 replica shard. Right now there are around 30,000 shards active on the cluster - 15k primaries, 15k replicas. The active dataset for the main Analytics scenarios consists of about 42 indices with 213,803,714 documents - the total size is about 26GB.

The query I'm trying to optimize is based a cardinality aggregation on a high cardinality string field. The documents being aggregated are first filtered.

Here are some metrics on the performance:

  • Queries running in parallel - between 5 and 10
  • 95th percentile latency - 25 seconds
  • CPU load hovers around 90% on all 5 nodes when the queries are being run
  • Heap usage is between 40-60% on all 5 nodes
  • Almost no read operations are seen on the EBS volume, CPU wait is 0 constantly

These metrics lead me to believe that the fielddata is being cached properly, so are the filters. I'd love some advice about where to go from here, apart from more nodes.

Thanks!

That's way to many shards, you really need to reduce that!

Hi Mark,

Thanks for the advice. Do you have a ballpark target for shards per node?

I should say that I've already reduced the number of shards without any noticeable performance increase.

There's no single number, it depends on a lot of things.

However each shard is a Lucene instance, which requires a set number of resources to run and manage. Having an excessive number of shards sucks up resources that could be otherwise used for doing things with ES in heap.

If I calculated correctly, it seems you have an average shard size of less than 2MB, which is very, very small. The ideal shard size depends on the use case but is quite often a few to tens of GigaBytes. The ideal number of shards per node can also vary depending on use case and hardware profile, but typically ranges between rom tens to a possibly few hundred shards. 6000 is definitely too much.

If you are unable to consolidate your indices and need to keep all 42, I would suggest to try switching to monthly indices with a single shard per index, and see how that affects performance. That will give you 1008 (42 * 2 * 12) shards per year which is more manageable, although possibly still a bit high. Depending on your retention period, you may want to consolidate that even further.

Thanks for taking the time to answer, Christian and Mark. A few more words about our indexing scheme:

  • Separate indexes per customer
  • Separate indexes per month, per week

This has indeed resulted in a large amount of shards; we were not aware of the inefficiencies that introduces.

We'll definitely be moving to monthly indices now; this will obviously reduce the amount of shards in the cluster by a factor of four. However, this still leaves us with about 7,500 shards in the cluster.

Since we'd like to keep the segregation of customers between indices, the only option would be to further increase the timeframes of each index or reduce the primary shard amount for each index.

Is the second option recommended? This essentially means not all nodes of the cluster will participate in the aggregation queries. I was under the impression that this will hurt performance.

Would love to hear your opinion - thanks again!

I would still recommend reducing the number of shards per index in order to further limit the total shard count. As you have a number of queries typically running in parallel covering different time periods and possibly different customers, it is likely that the query load will be spread out even if all nodes do not participate in every query.

If you know that your users have a certain query pattern, e.g. some users are more active or they mostly only query the last months worth of data, you could also try increasing the number of replicas for the individual indices that are most queried in order to spread out load more evenly.

As this tend to be use case specific, you may want to spin up a separate cluster and use this to benchmark a few different options, to see what works best for you.

Are you aware of the filtering capabilities provided by index aliases?
This allows you to have many users sharing the same physical Lucene index but filtering their view of its contents.

This is not without issues but I mention it because it is a common approach and wasn't sure if you were aware of it.

@Christian_Dahlqvist - Alright, thanks. I am actually testing these scenarios on a benchmarking cluster. Will report back with results :smile:

@Mark_Harwood - yeah, I'm aware of that, but like in most cases, am pressed by a deadline and would prefer not to make much changes to application code right now :slight_smile: I'll keep it in mind if performance issues remain once we're done consolidating shards in other ways.

Thank you both!