This doc says elasticsearch requires enough free disk space for a second copy of the index when splitting shards.
The node handling the split process must have sufficient free disk space to accommodate a second copy of the existing index.
But this discussion seems to indicate that it depends on how many shards are being created. When splitting a 5 shard index to 20 shards it requires 4x the disk space, right?
Should I file a github issue to get the documentation fixed? Or am I misunderstanding, maybe the hard links splitting creates makes it seem like it's using more space than it really is?
Yeah I think you're misunderstanding. The total size of all the shards will indeed go up by a large amount, but this metric double-counts any files that are hard-linked between shards. The actual disk space needed should only be approximately 2x the size of the original shards.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.