Significant terms aggregation on large dataset

I would like to do a significant terms aggregation across a lot of data. I currently have about 1TB of data in about 5 million top level documents, which all have a few hundred nested documents. The field that I want to create a significant terms aggregation on is in the nested documents. The dataset is going to continue to grow.

At the moment my significant terms aggregation is timing out, due to the AWS Elasticsearch Service hard limit of 60 seconds, but I can see that the task is running for about 90 seconds. This is even when I am using partitions. I don't mind this aggregation taking a long time (as long as my cluster is available for other queries), as it will only happen a few times per day and is not user-facing, but would provide a lot of value. I also don't mind moving from AWS Elasticsearch Service, or increasing the RAM on my machines, if that is what is needed.

Mostly I am wondering if this is just a stupid thing to attempt. If it's not stupid, what do I likely need to do to make it work?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.