Extremely slow nested aggregations, need suggestions on modeling/shards

Hi all,

Im fairly new to ElasticSearch and we're looking to move our analytics from SQL to ElasticSearch.

I'm using AWS ElasticSearchService and here are some of the stats from the test data I indexed:

Index: Time based(YYYY-MM-DD), like logstash, but data is not static
~500 Million documents
Largest Index: ~32GB (We predict our production data will be more or less close to this size for all indices)
Number of Documents: ~10 million in the above index
Cluster: 3 dedicated master nodes, 7 X r4.8xlarge data nodes
Shards: 5 per index
Replicas: 2 per index
doc_values are enabled on almost 90% of the fields

The data is basically financial data with a lot of numbers and some text fields (think invoices). Due to confidentiality reasons I cannot divulge the mappings/structures, but our use-case mainly boils down to a lot of nested aggregations and then sum/avg etc in the end on a particular date range (so basically over multiple indices).

However, any nested aggregations more than 3-4 levels (most of our requirements require an avg of 4 levels of nesting) takes the query around 30+ seconds to execute.

I was mainly testing on a single index(the largest), and the queries are mostly search with the nested aggregations.
Initially I was not using any filters, but on applying multiple terms filters on the terms aggregation fields shows some performance (down to 850ms) improvement, but I would ideally not want to use these filters.

My questions are:

  • Is my bottleneck the number of shards per index ? I had some success with more replicas, but the responses are still in the 30s range
  • Would it be better to filter and then do multiple nested aggregations ?

Thanks in advance and apologies for the vagueness.