Another question: I have 8 machines for 2048G data. Which design is better? 1) 8 shards with each 256G, so each machine has one shard; 2) 70 shards, so each machine has about 8 shards.
I know there are 2 best practices: keep shard at 30G and one machine has one shard. In my case, I need the best performance. Which one is better?
Are you optimising for query latency or query throughput or a combination of the two? The optimal shard count will depend on your use case, hardware, data and queries, so I would recommend you run a benchmark to find out.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.