I'm planning to modify shard and replica settings cause mine is not resonable.
I have 5 types of indices which sized 30MB, 2GB, 25GB, 35GB, 200GB and each of them have 2 sets(live and indexing indices per each).
Now, every indices set as 2 shards and 5 replicas.
As I learned here, each shards should be around 20-50GB.
So I'm planning
30MB, 2GB index: 1 shard
25GB, 35GB index: 2 shards
200GB index: 5 shards
is it proper?
I heard that the number of replica affect to the search performance, so that, I'm wondering what number of replica is resonable when the service traffic is over 2,000 in general, and over 10,000 per second in a specific period?
The shard and replica count really depends on a lot of factors - including document size, the number and types of queries you'll be running, mappings, etc. Our best, recommended way to know the ideal sharding strategy for your use case is to benchmark your data with the types of indexing and search loads you expect.
I've read the page you guided several times, however, it is still difficult to decide..
All aside..
If i have 20 nodes and a 30GB-primary index, which can have better search or index performance?
Or at least, what would you choose, if you were me..
My question is, what could be the main factor for the better performance between shards and replicas.
Of course, I know that there are so many factors but still I'm facing to choose the balance.
The linked page includes a section entitled "Aim for shards of up to 200M documents, or with sizes between 10GB and 50GB" which I think answers your question. With 4 shards each shard would be less than 10GiB which is smaller than recommended. 2x15GiB shards would be within the recommended range, but so would 1x30GiB shard.
Almost certainly not this. Or at least, some workloads will get better results with 1x30GiB primary shard and others might do better with 2x15GiB primary shards. There's no way to be sure without benchmarking both setups using your specific workload and data.
Yes, definately. My options were not properly representing the point.
Anyway.. all of you guys said it is up to various factors, I will check that I'm able to set and run Rally for our running service.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.