Deploying New Elastic Cluster and Migrate Live Logs

Hi folks,

I have following use case:

I want to install a new 7.13.2 Elasticsearch cluster on some brand new VMs and migrate the indices and global cluster state from 7.6.2 cluster into it.

The challenge is: there are logs shipped to the cluster permanently using Logstash and I do not want to have any downtime, double logs or missing a single log line after the migration.

I have already set everything up including kibana, all security settings, system indices and now I am testing different index migration scenarios.

And the current best, easiest and safest scenario I have figured out so long is the following:

  1. Change logstash Output of a pipeline to the new cluster nodes
  2. Reload logstash pipelines (SIGHUP)
  3. Logstash will create new index in the new cluster according to the template, aliases, ilm policy configuration
  4. New logs are flowing to the new cluster and not to old cluster anymore
  5. I create a snapshot of the index in the old cluster
  6. I restore the index with _restored suffix in the new cluster so all kibana index pattern will show the restored logs too
  7. Migration is done.

The current drawbacks of this solution:

  1. I rely on the logstash mechanisms so during the pipeline output switch to new elasticsearch ndoes no logs are getting dropped. This I could verify by randomly comparing log lines. But it is of course not a concrete proof.
  2. There will be some time where logs form the old cluster are not available in the new cluster, which is...well...not very good actually but tolerable.

Can anyone share any better way of migrating live data from one elastic cluster to another without downtimes and losing any log data?

I am thankful for any hints

What is your Logstash input? Depending on your input you could lose some logs when logstash is reloading the pipeline, this would happen for example if you have a UDP input receiving network device data.

What snapshot are you going to use? Does both of your cluster already have it configured? To add a file system snapshot or one that needs a plugin, like the s3 or gcp one, you will need to restart every node to configure it.

Reindex from remote could also be an option, but it alsos need a restart.

Another way would be using a logstash pipeline that would have your 7.6.2 cluster as the elasticsearch input and your 7.13.2 cluster as the elasticsearch output, since you are running VMs you could spin-up a temporary machine to do this and let it running until it finished migrating your index.

Thanks for quick reply. Input is in most cases beats like:

input {
  beats {
    port => 6150
    ssl => true
    ssl_certificate_authorities => ["/path/to/ca.crt"]
    ssl_certificate => "/path/to/server.crt"
    ssl_key => "/path/to/server.key"
    ssl_key_passphrase => "${SERVER_KEY}"
    ssl_verify_mode => "force_peer"

and one input which is using virtual address:

input { pipeline { address => ["[kubernetes][namespace]"] }}

I am using snapshots stored in azure blob storage. In both clusters I have installed the respective azure plugin and I can access the repository and easily restore snapshots.

I am not sure I have understood you correct:

You suggest to change the output of my logstash pipelines to a pipeline which has the new elastic cluster as output? What is the benefit? At the point of time when I apply this change all the new logs will be shipped to the new cluster instead of the old. Is this correct?

The beats input have an internal queue, so when you reload your logstash pipeline it will be seen as temporarily unavailable to beats, but beats will send the logs again as soon as the logstash pipeline is up.

I am using snapshots stored in azure blob storage. In both clusters I have installed the respective azure plugin and I can access the repository and easily restore snapshots.

If you have the snapshots already working on both your cluster then you can use the snapshot and restore to move your index from one cluster to another.

You suggest to change the output of my logstash pipelines to a pipeline which has the new elastic cluster as output?

No, It was a suggestion to move data from one cluster to another if you didn't have the snapshots configured or didn't want or couldn't restart your nodes, using a logstash pipeline with a elasticsearch input and a elasticsearch output would allow you to copy data from one cluster to another, since you have the snapshots configured you do not need to do this.

What is the size of your cluster(s)? have you considered instead first adding the new nodes to the cluster, relocate the shards between the clusters before finally slowly removing the nodes in the old cluster?

Ok thanks. Now I have understood.

Regarding the snapshots. When I restore the snapshot to my new cluster (which works fine) I will just have the logs which are snapshotted there (there are thousands of events per second indexed). So for example when I have a snapshot created at 10:00 AM I can restore the logs which have been indexed until 10:00 AM. Everything which came after is of course not a part of the snapshot. So there always be some gap, which I need to fill somehow.

This problem I am trying to solve with the approach described in my initial post.

Current data size is 1,5 TB on disk (replicated) consiting of 3 nodes (data/master). And this amount will also be restored to the new cluster.

I have tried to add nodes one by one to the "old" cluster in a test environment I have created for this use case, but very quikly run into critical problems and broke the cluster, because I messed up some relocation settings and could not relocate shards from the 7.13.2 cluster back to 7.6.2. So this approach was too risky to execute on a productive cluster

I would suggest that you first migrate your ingestion pipelines to the new cluster and after that you start creating the snapshots in the old cluster as the data in the old cluster is not being written anymore.

This way there will be no changes in the data in the old cluster and you will just need to restore those snapshots in the new cluster.


Alright I think we proceed this way. Even though it means that for some time (until we have restored the snapshot to the new cluster) the users won't be able to find their old logs. But i think we need to accept this trade off