To decide the number of primary shards for my use case. I am using custom _routing.
I need to choose a number of primary shards so that all my content is evenly distributed across shards.
My current issue: when i am indexing data using multiple threads and all my data which has same _routing, is getting shoved into the same shard, i am loosing data.
I have data divided into several groups. Each group has a two character identifier. My _routing is not based on any field in my document.
I expect each group to be stored on a single shard.
Now my two character identifier serves as my "_routing" while indexing.
At search time i know my identifier, so it is easy to search.
I expect my groups to be evenly divided across my shards. Hence i need to decided optimum number of primary shards which will be useful while indexing and also beneficial in searching.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.