My production server is on 1.7.5 and I expect it to remain that way for quite a while, although it looks to me like even for the 2.x series the guidance is "use 1 for spinning disk!" with no clear distinction between drives and arrays.
If I used the raw algorithm documented there it'd end up recommending I that allocate 4 threads (min(4,(24/2))... it'd be around 300 MB/s/thread at that point. However, my understanding is that there is a diminishing return on concurrency for RAID writes on spinning disk, so I'm wondering if I should drive a little lower in that case. (Edit: first time I read it as if the calculation would be 12 threads; 4 seems almost sane).
I guess I'm wondering if the guidance is "for anything at all that spins" or if it has more to do with I/O throughput and so RAID would be a potential exemption, even if with some hesitation.
Are you seeing any issues around merging or merge throttling with your current settings?
To be sure you are selecting an optimal value I would generally, if it is possible, recommend benchmarking the options and see what difference it makes on your particular hardware.
Yes, currently I end up with merge throttling when indexing at the low and high ends of our indexing rates. The high ends do not surprise me so much as the low ends. It doesn't make sense to me that I'm not able to keep up with merging if throttling is disabled and I'm doing low EPS.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.