I am trying to create a backup script for our Elasticsearch clusters. We
are running on Amazon SSD machines so EBS snapshots are not an option.
Instead I am trying to go the rsync route.
My problem is that rsync complains about files vanishing while it is
syncing (in the CLUSTER_NAME/nodes/0/indices/INDEX_NAME/SHARD/index
directory). I am disabling flush and running a manual flush before I launch
the rsync.
I assume that is this is due to segments being merged. But I assumed that
would not happen since I have disabled flush (at least it looks like I
have). Or may it be that the merge is already in progress when I disabled
flush? Can I see if it is merging any segments?
Alternatively is there some other way this could be happening?
A quick follow up to myself. I added code to my script which prints the
total number of flush and merge operations (from the _nodes/stats API). I
print these values after I have disabled flush and the again after rsync
has failed (but before I enable flush). The result is that the flush.total
is constant but the merges.total counter has increased.
Yes, merges might still happen behind the scene (they might also be ongoing). This might add some files from the index directory, but there won't be a new "commit" point in Lucene, so its ok to copy over the files even if they "change".
A quick follow up to myself. I added code to my script which prints the total number of flush and merge operations (from the _nodes/stats API). I print these values after I have disabled flush and the again after rsync has failed (but before I enable flush). The result is that the flush.total is constant but the merges.total counter has increased.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.