I'm applying an ILM policy on older (daily) indices. In warm phase I have enabled forcemerge to reduce the number of segments.
If I assign the ILM policy to multiple indices, say 30 days, will the forcemerge run on all the indices at the same time, or does Elastic handle this 1 at a time?
The reason I'm asking is because if Elastic would run all forcemerge at the same time, I figure this will result in high fragmentation of the files on disk.
... but only if you increase the number of force-merge threads from the default of 1.
All this is kind of irrelevant, however, because force merges will be interleaved with other write operations (including automatic merges) so there's definitely no guarantee that serial force-merges imply no fragmentation. It's even more irrelevant because you will likely only run into actual problems related to fragmentation if your disks are nearly full and/or you're using some ancient filesystem that doesn't control fragmentation properly.
2TB of data on 4TB spinning disks with fragmentation at 92%. The weekly defrag task didn't work that well I guess... After all force-merges where finished I let the customer do a manual defrag so it's no longer a problem.
It's >7 years old and had mainstream support withdrawn 2½ years ago, so it's certainly no spring chicken...
Was it a problem before? How did that problem manifest? I get that fragmentation was reported, but I don't understand how that actually affected anything important.
Like the OS, the server and used disks are no spring chicken either
Fragmentation can have a negative performance impact on spinning disks. So to keep performance as good as possible, less fragmentation is better.
With newer servers with more memory and SSD storage this is no longer an issue...
Yes in theory it can have a performance impact - my question is whether it really did. I would expect the difference to be lost in the noise in most cases. Can you quantify the improvement that you observed via manual defragmentation in terms of its effect on actual performance? If so, that's unexpected and interesting to me. If not, why worry about it?
I did not do any performance tests before and after, so unfortunately I cannot tell you if there is any noticeable difference. Maybe I should have done so, but at the time I wrote my question, I had already started with the force-merge actions and de-fragmentation.
This question was mainly based on experiences I had in the past with highly fragmented drives where Windows and some disk intensive applications would benefit from low file fragmentation (at that time disks where a lot slower than currently available drives though).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.