This area is complex because distributed systems make things complex.
If you can work with a single shard/index that makes analytics more accurate but brings its own problems in terms of scale.
Assuming you stick with multiple shards/indices you'll likely benefit from the shard_size setting covered in the docs. That can increase accuracy which you can check by looking at the doc_count_error_upper_bound value in the results. When it's zero your results should be accurate. The size parameter has a similar effect but with the disadvantage that it increases the number of final results returned - many of which you may be uninterested in.
If you hit the "too many buckets" issue you've pushed size/shard-size too hard and are trying to squeeze too much into your responses. You then need to back up and try break your analysis into multiple requests. That could be done using queries as you suggest or using partitioning in the terms aggregation.
Here's a sketch of a decision process that might help.
