I intend to schedule replication from one cluster into a "backup cluster".
I've considered using the snapshot API (but it's not allowed to backup from ES to another ES, going through FS, S3 or HDFS are the only options)
Also, reading in the forums, someone mentioned doing rsync between the clusters' filesystems was an option (but clusters may be of different topology and also I don't want to have inconsistent data nor downtime in the "destination" cluster.
So, moving to a solution based on logstash, my configuration file is:
input {
elasticsearch {
hosts => [ "HOSTNAME_HERE" ]
port => "9200"
index => "INDEXNAME_HERE"
size => 1000
scroll => "5m"
docinfo => true
scan => true
}
}
output {
elasticsearch {
hosts => [ "HOSTNAME_HERE" ]
index => "%{[@metadata][_index]}"
document_type => "%{[@metadata][_type]}"
document_id => "%{[@metadata][_id]}"
}
stdout {
codec => "dots"
}
}
But this will not use the "scroll_id" at all, so my transfers are limited to "size" records (1000 by default) and I obviously don't know the number of documents per index beforehand.
This should be as automatic as possible, and ideally similar to what rsync does but at a cluster level (all changes should be replicated to the backup cluster.
Any ideas / suggestions?
Thanks