I know I have seen this question asked before but I couldn't find a solid
answer for it in the group, so I am probably repeating the question. I
wanted to know what the recommend solution, if there is one, is in regards
to upgrading ElasticSearch versions. I feel that a rolling upgrade can be
performed against a production environment without any data lost but I
can't say that I am 100% confident in it; mostly due to a lack of
experience with doing it. Let me describe our setup and our proposed
solution to see if you all agree that this is the best route to go.
The Scenario
n our situation we have two 3 node clusters running in multiple data
centers for redundancy purposes with exact data being fed into the two
systems. Each cluster contains one index (aliased of course) with 8 shards
and 2 replicas for a total of 24 replicas/shards/chunks. Cluster settings
are minimum_master_nodes = 2, local storage, recover_after_master_nodes =
2, recorver+after_time = 5m, and expected_nodes = 3. All nodes can be a
master node.
Upgrade Process #1 - Rolling Upgrade
If we were to ever perform an upgrade I think I would prefer a rolling
upgrade as opposed to using the Shutdown API so that our system is live all
of the time. I'm just afraid that there will be a point when we do this
that the system will be down based on our settings to avoid split-brain.
This will probably happen because based on ElasticSearch version
compatibility to create a cluster, minimum_master_nodes at quorum - 1,
there will be a point when we are split and a system is down. The downside
to this is that there would be 'downtime' for the amount of time it takes
to upgrade one server. I assume behavior would be something like:
Upgrade ElasticSearch on Server One (stop service, install new
binaries, start service)
Replicas will balance between Server Two and Server Three.
2. Server One will not join the current cluster due to version
differences
3. Server One will wait in 'recovery' until at least 2 master nodes
are available
4. Server Two and Three remain in a healthy cluster.
Upgrade ElasticSearch on Server Two (stop service, install new
binaries, start service)
Replicas will now all be on Server Three.
Server Two will form a cluster with Server One and they will
recovery indexes based on data they have in their data directories.
Here is where I am not 100% sure if we have 100% of the data
coverage. If we are evenly balanced than their should be at least one copy
of each shard in each data directory.
Server Three isn't healthy anymore since there aren't enough
masters available.
Upgrade ElasticSearch on Server Three (stop service, install new
binaries, start service)
Server Three will join cluster with Server Two and Server One.
Replicas will get pushed to Server Three. Perhaps not fresh
copies if it had a portion of existing replica in data directory?
Upgrade Process #2 - Full Cluster Shutdown
The downside to this is that we might be 'down' the amount of time it takes
to upgrade two servers as opposed to one.
Execute the Shutdown API to all servers so that no replica
re-balancing takes place.
Upgrade each server to the new version of ElasticSearch
Bring them up one at a time
From the scenarios above it sounds like both options leaves us with a
slight amount of downtime where we couldn't read or write from the
cluster. For us, working in the finance industry, we can't miss any data
and we are constantly writing to ElasticSearch. Thankfully, since we have
to operate two data-centers that are independant of one another we have a
utility that compares the data that was recently inserted/updated (5-10
minute range searches) in ElasticSearch and attempts to merge data from one
cluster to another. With that utility I'm guaranteed not to have data
loss, but it still means that one of my data-centers would have to be taken
offline while we do this upgrade.
I did not understand why rolling upgrade will cause downtime? You have 3
nodes in the cluster, with minimum master nodes set to 2, so, if you
restart one node at a time you will be good to go. Note though, that a full
cluster shutdown is required when upgrading to a major new version (0.17 ->
0.18 for example).
I know I have seen this question asked before but I couldn't find a solid
answer for it in the group, so I am probably repeating the question. I
wanted to know what the recommend solution, if there is one, is in regards
to upgrading Elasticsearch versions. I feel that a rolling upgrade can be
performed against a production environment without any data lost but I
can't say that I am 100% confident in it; mostly due to a lack of
experience with doing it. Let me describe our setup and our proposed
solution to see if you all agree that this is the best route to go.
The Scenario
n our situation we have two 3 node clusters running in multiple data
centers for redundancy purposes with exact data being fed into the two
systems. Each cluster contains one index (aliased of course) with 8 shards
and 2 replicas for a total of 24 replicas/shards/chunks. Cluster settings
are minimum_master_nodes = 2, local storage, recover_after_master_nodes =
2, recorver+after_time = 5m, and expected_nodes = 3. All nodes can be a
master node.
Upgrade Process #1 - Rolling Upgrade
If we were to ever perform an upgrade I think I would prefer a rolling
upgrade as opposed to using the Shutdown API so that our system is live all
of the time. I'm just afraid that there will be a point when we do this
that the system will be down based on our settings to avoid split-brain.
This will probably happen because based on Elasticsearch version
compatibility to create a cluster, minimum_master_nodes at quorum - 1,
there will be a point when we are split and a system is down. The downside
to this is that there would be 'downtime' for the amount of time it takes
to upgrade one server. I assume behavior would be something like:
Upgrade Elasticsearch on Server One (stop service, install new
binaries, start service)
Replicas will balance between Server Two and Server Three.
2. Server One will not join the current cluster due to version
differences
3. Server One will wait in 'recovery' until at least 2 master nodes
are available
4. Server Two and Three remain in a healthy cluster.
Upgrade Elasticsearch on Server Two (stop service, install new
binaries, start service)
Replicas will now all be on Server Three.
Server Two will form a cluster with Server One and they will
recovery indexes based on data they have in their data directories.
Here is where I am not 100% sure if we have 100% of the data
coverage. If we are evenly balanced than their should be at least one copy
of each shard in each data directory.
Server Three isn't healthy anymore since there aren't enough
masters available.
Upgrade Elasticsearch on Server Three (stop service, install new
binaries, start service)
Server Three will join cluster with Server Two and Server One.
Replicas will get pushed to Server Three. Perhaps not fresh
copies if it had a portion of existing replica in data directory?
Upgrade Process #2 - Full Cluster Shutdown
The downside to this is that we might be 'down' the amount of time it
takes to upgrade two servers as opposed to one.
Execute the Shutdown API to all servers so that no replica
re-balancing takes place.
Upgrade each server to the new version of Elasticsearch
Bring them up one at a time
From the scenarios above it sounds like both options leaves us with a
slight amount of downtime where we couldn't read or write from the
cluster. For us, working in the finance industry, we can't miss any data
and we are constantly writing to Elasticsearch. Thankfully, since we have
to operate two data-centers that are independant of one another we have a
utility that compares the data that was recently inserted/updated (5-10
minute range searches) in Elasticsearch and attempts to merge data from one
cluster to another. With that utility I'm guaranteed not to have data
loss, but it still means that one of my data-centers would have to be taken
offline while we do this upgrade.
In the first scenario there would be downtime when Server One was on .
18, Server Two was shutdown and being upgraded, and Server Three was
on .17. Because Server One and Server Three were on different
versions they couldn't form a cluster. And because the minimum master
nodes is set to two they probably were in a red/yellow state. I
assume when they are in that state that no read/write operations can
happen (maybe reads but I can't imagine writes are allowed). As such
there will be downtime until at least two masters can see each other.
Regardless, I think for major upgrades we are going to need to
shutdown and then do a data sync against our other data center until
Elastic Search can handle different versions joining clusters
together.
I did not understand why rolling upgrade will cause downtime? You have 3
nodes in the cluster, with minimum master nodes set to 2, so, if you
restart one node at a time you will be good to go. Note though, that a full
cluster shutdown is required when upgrading to a major new version (0.17 ->
0.18 for example).
I know I have seen this question asked before but I couldn't find a solid
answer for it in the group, so I am probably repeating the question. I
wanted to know what the recommend solution, if there is one, is in regards
to upgrading Elasticsearch versions. I feel that a rolling upgrade can be
performed against a production environment without any data lost but I
can't say that I am 100% confident in it; mostly due to a lack of
experience with doing it. Let me describe our setup and our proposed
solution to see if you all agree that this is the best route to go.
The Scenario
n our situation we have two 3 node clusters running in multiple data
centers for redundancy purposes with exact data being fed into the two
systems. Each cluster contains one index (aliased of course) with 8 shards
and 2 replicas for a total of 24 replicas/shards/chunks. Cluster settings
are minimum_master_nodes = 2, local storage, recover_after_master_nodes =
2, recorver+after_time = 5m, and expected_nodes = 3. All nodes can be a
master node.
Upgrade Process #1 - Rolling Upgrade
If we were to ever perform an upgrade I think I would prefer a rolling
upgrade as opposed to using the Shutdown API so that our system is live all
of the time. I'm just afraid that there will be a point when we do this
that the system will be down based on our settings to avoid split-brain.
This will probably happen because based on Elasticsearch version
compatibility to create a cluster, minimum_master_nodes at quorum - 1,
there will be a point when we are split and a system is down. The downside
to this is that there would be 'downtime' for the amount of time it takes
to upgrade one server. I assume behavior would be something like:
Upgrade Elasticsearch on Server One (stop service, install new
binaries, start service)
Replicas will balance between Server Two and Server Three.
2. Server One will not join the current cluster due to version
differences
3. Server One will wait in 'recovery' until at least 2 master nodes
are available
4. Server Two and Three remain in a healthy cluster.
Upgrade Elasticsearch on Server Two (stop service, install new
binaries, start service)
Replicas will now all be on Server Three.
Server Two will form a cluster with Server One and they will
recovery indexes based on data they have in their data directories.
Here is where I am not 100% sure if we have 100% of the data
coverage. If we are evenly balanced than their should be at least one copy
of each shard in each data directory.
Server Three isn't healthy anymore since there aren't enough
masters available.
Upgrade Elasticsearch on Server Three (stop service, install new
binaries, start service)
Server Three will join cluster with Server Two and Server One.
Replicas will get pushed to Server Three. Perhaps not fresh
copies if it had a portion of existing replica in data directory?
Upgrade Process #2 - Full Cluster Shutdown
The downside to this is that we might be 'down' the amount of time it
takes to upgrade two servers as opposed to one.
Execute the Shutdown API to all servers so that no replica
re-balancing takes place.
Upgrade each server to the new version of Elasticsearch
Bring them up one at a time
From the scenarios above it sounds like both options leaves us with a
slight amount of downtime where we couldn't read or write from the
cluster. For us, working in the finance industry, we can't miss any data
and we are constantly writing to Elasticsearch. Thankfully, since we have
to operate two data-centers that are independant of one another we have a
utility that compares the data that was recently inserted/updated (5-10
minute range searches) in Elasticsearch and attempts to merge data from one
cluster to another. With that utility I'm guaranteed not to have data
loss, but it still means that one of my data-centers would have to be taken
offline while we do this upgrade.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.