This is just one index.
I have 5 nodes and set it to 5 shards and one replica.
My shards are large 120-180 gigs and CPU on my nodes is consistently below 7% (they have 8 cores).
If I double or even triple my shards will that improve searches and possibly index time by taking advantage of the underutilized CPUs? I have a lot of CPU capacity.
Also, I have a feeling I'll have to break up the shards anyway because they are kind of unwieldy at this size.
The key point is that you are likely to be limited by disk IO and memory, long before CPU is a concern.
Your indexes are pretty large so index speed will be impacted. If this is time series data you should consider daily (or some other time interval) indexes, which are also discussed in that other thread.
But getting back to my original question, will doubling or tripling my shard size improve performance (assuming my nodes can handle the added resources it will require)?
OR would it be better to keep the shard size the same and split my indices on 6 hour increments which would give me 4 times my current number of indices
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.