Replicate all changes from one cluster to another

I intend to schedule replication from one cluster into a "backup cluster".

I've considered using the snapshot API (but it's not allowed to backup from ES to another ES, going through FS, S3 or HDFS are the only options)

Also, reading in the forums, someone mentioned doing rsync between the clusters' filesystems was an option (but clusters may be of different topology and also I don't want to have inconsistent data nor downtime in the "destination" cluster.

So, moving to a solution based on logstash, my configuration file is:

input {


  elasticsearch {


    hosts => [ "HOSTNAME_HERE" ]


    port => "9200"


    index => "INDEXNAME_HERE"


    size => 1000


    scroll => "5m"


    docinfo => true


    scan => true


  }


}







output {


  elasticsearch {


    hosts => [ "HOSTNAME_HERE" ]


    index => "%{[@metadata][_index]}"


    document_type => "%{[@metadata][_type]}"


    document_id => "%{[@metadata][_id]}"


  }


  stdout {


    codec => "dots"


  }


}

But this will not use the "scroll_id" at all, so my transfers are limited to "size" records (1000 by default) and I obviously don't know the number of documents per index beforehand.

This should be as automatic as possible, and ideally similar to what rsync does but at a cluster level (all changes should be replicated to the backup cluster.

Any ideas / suggestions?

Thanks

I don't see how this could work at all. The elasticsearch plugin is stateless and pulls all documents matching the condition each time. If you restart Logstash it'll pull all of it again.

I'm not really sure what problem you're trying to solve, but I'd send all processed events to a broker and use two "dumb" Logstash instances that each feed off of a private queue on said broker and each feeds its own ES cluster.

1 Like