Files vanishing during backups

I am trying to create a backup script for our Elasticsearch clusters. We
are running on Amazon SSD machines so EBS snapshots are not an option.
Instead I am trying to go the rsync route.

My problem is that rsync complains about files vanishing while it is
syncing (in the CLUSTER_NAME/nodes/0/indices/INDEX_NAME/SHARD/index
directory). I am disabling flush and running a manual flush before I launch
the rsync.

I assume that is this is due to segments being merged. But I assumed that
would not happen since I have disabled flush (at least it looks like I
have). Or may it be that the merge is already in progress when I disabled
flush? Can I see if it is merging any segments?

Alternatively is there some other way this could be happening?

/MaF

--

A quick follow up to myself. I added code to my script which prints the
total number of flush and merge operations (from the _nodes/stats API). I
print these values after I have disabled flush and the again after rsync
has failed (but before I enable flush). The result is that the flush.total
is constant but the merges.total counter has increased.

--

Yes, merges might still happen behind the scene (they might also be ongoing). This might add some files from the index directory, but there won't be a new "commit" point in Lucene, so its ok to copy over the files even if they "change".

On Jan 24, 2013, at 6:03 PM, maf@recordedfuture.com wrote:

A quick follow up to myself. I added code to my script which prints the total number of flush and merge operations (from the _nodes/stats API). I print these values after I have disabled flush and the again after rsync has failed (but before I enable flush). The result is that the flush.total is constant but the merges.total counter has increased.

--