Hello,
How should I choose the size parameter when using partitioning query ?
E.g
Suppose the aggregation query result returns 60K hits and say I use num_partitions = 20, then is it correct to expect the total result is spread across 20 partitions, so I can specify the size as 3000 (60k/20) when making 20 queries with partition id 0 through 19 ?
A related but different question:
I am using the current default 5 shards) for the index.
What is the recommended best practice in terms of choosing the number of partitions in relations with number of shards in the index ?
More partitions will translate to more query (i.e. more IOPs) with smaller results.
vs running smaller number of larger individual queries by using less number of partitions.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.