How to choose size parameter when using partitioning

Hello,
How should I choose the size parameter when using partitioning query ?

E.g
Suppose the aggregation query result returns 60K hits and say I use num_partitions = 20, then is it correct to expect the total result is spread across 20 partitions, so I can specify the size as 3000 (60k/20) when making 20 queries with partition id 0 through 19 ?

e.g. [ showing the first aggregation block only ]

"aggs": {
"ip": {
"terms": {
"field": "ip",
"include": {
"partition": 0,
"num_partitions": 20
},
"size": 3000
},

Similarly - if I use num_partitions = 200, then should I use size = 300 and run 200 queries to retreive the 60k response ?

Thanks a lot for your insight.
best regards
Anindya

A related but different question:
I am using the current default 5 shards) for the index.
What is the recommended best practice in terms of choosing the number of partitions in relations with number of shards in the index ?
More partitions will translate to more query (i.e. more IOPs) with smaller results.
vs running smaller number of larger individual queries by using less number of partitions.

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

And https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.