So I'm about to upgrade to 1.3.4, but due to some unfortunate circumstances
I need to migrate my ES cluster to new VMs.
The environment is fairly simple. At the top I have logstash agent pulling
messages off of a Redis server and feeding it to my 2 node cluster (2
replicas, 2 shards per index). So for what it's worth I can stop logstash
and the cluster will essentially stop indexing data, allowing me to shut it
down without issue. Once I have the old cluster shut down, I intend to
rsync it over to the new cluster which is 3 nodes (2 replicas, 3 shards per
index).
What is the best approach here? I was thinking that I could rsync the data
folder from 1 of my 2 VMs running on the old cluster but then I realized
that the primary shard for each index might not be on that VM. Can I
manually set the primary shard somehow?
Ah, slight typo in regard to the old cluster. It is 1 replica per index.
On Thursday, October 23, 2014 10:13:57 PM UTC+2, Magnus Persson wrote:
So I'm about to upgrade to 1.3.4, but due to some unfortunate
circumstances I need to migrate my ES cluster to new VMs.
The environment is fairly simple. At the top I have logstash agent pulling
messages off of a Redis server and feeding it to my 2 node cluster (2
replicas, 2 shards per index). So for what it's worth I can stop logstash
and the cluster will essentially stop indexing data, allowing me to shut it
down without issue. Once I have the old cluster shut down, I intend to
rsync it over to the new cluster which is 3 nodes (2 replicas, 3 shards per
index).
What is the best approach here? I was thinking that I could rsync the data
folder from 1 of my 2 VMs running on the old cluster but then I realized
that the primary shard for each index might not be on that VM. Can I
manually set the primary shard somehow?
Unless you are moving to new hardware, there is no need to rsync your data.
Both Elasticsaerch 0.90.x and 1.3.x are based on Lucene 4, so the
underlying data is compatible. Of course, you should backup your data
before such an upgrade.
After restarting your new cluster with your old data, I would run an
optimize on your indices so that Lucene can upgrade all your segments into
the new format. There have been some issues with Lucene format
incompatibilities, but they usually deal with indices with beta Lucene
versions.
You cannot bring up a mixed cluster between 0.90 and 1.x, so you would need
to stop all your VMs. Why are you interested in primary shards?
Elasticsearch is not like most database where the primary node has an extra
special connotation. I have not played around with shard allocation much,
but here is an old article:
Ah, slight typo in regard to the old cluster. It is 1 replica per index.
On Thursday, October 23, 2014 10:13:57 PM UTC+2, Magnus Persson wrote:
So I'm about to upgrade to 1.3.4, but due to some unfortunate
circumstances I need to migrate my ES cluster to new VMs.
The environment is fairly simple. At the top I have logstash agent
pulling messages off of a Redis server and feeding it to my 2 node cluster
(2 replicas, 2 shards per index). So for what it's worth I can stop
logstash and the cluster will essentially stop indexing data, allowing me
to shut it down without issue. Once I have the old cluster shut down, I
intend to rsync it over to the new cluster which is 3 nodes (2 replicas, 3
shards per index).
What is the best approach here? I was thinking that I could rsync the
data folder from 1 of my 2 VMs running on the old cluster but then I
realized that the primary shard for each index might not be on that VM. Can
I manually set the primary shard somehow?
Oh, didn't know about optimize so I'll definitely keep that in mind.
The reason I was asking about primary shards is that I saw, when starting
from a rsync'd datafolder off of one of the nodes, double the amount of
documents. It wasn't immediatly apparent but when I later on tried with two
rsyncs matching up old node 1 with new node 1 and old node 2 with new node
2 the "duplicates" went away... and the cluster recovered significantly
faster. But reading this, it seems to be sufficient just to rsync the data
folder from any 1 node in the old cluster and things will just work? Is
there a way to verify the consistency of my cluster? Something like index
checksums, or somesuch?
On 24 October 2014 17:54, Ivan Brusic ivan@brusic.com wrote:
Unless you are moving to new hardware, there is no need to rsync your
data. Both Elasticsaerch 0.90.x and 1.3.x are based on Lucene 4, so the
underlying data is compatible. Of course, you should backup your data
before such an upgrade.
After restarting your new cluster with your old data, I would run an
optimize on your indices so that Lucene can upgrade all your segments into
the new format. There have been some issues with Lucene format
incompatibilities, but they usually deal with indices with beta Lucene
versions.
You cannot bring up a mixed cluster between 0.90 and 1.x, so you would
need to stop all your VMs. Why are you interested in primary shards?
Elasticsearch is not like most database where the primary node has an extra
special connotation. I have not played around with shard allocation much,
but here is an old article: ElasticSearch Shard Placement Control - Sematext
Ah, slight typo in regard to the old cluster. It is 1 replica per index.
On Thursday, October 23, 2014 10:13:57 PM UTC+2, Magnus Persson wrote:
So I'm about to upgrade to 1.3.4, but due to some unfortunate
circumstances I need to migrate my ES cluster to new VMs.
The environment is fairly simple. At the top I have logstash agent
pulling messages off of a Redis server and feeding it to my 2 node cluster
(2 replicas, 2 shards per index). So for what it's worth I can stop
logstash and the cluster will essentially stop indexing data, allowing me
to shut it down without issue. Once I have the old cluster shut down, I
intend to rsync it over to the new cluster which is 3 nodes (2 replicas, 3
shards per index).
What is the best approach here? I was thinking that I could rsync the
data folder from 1 of my 2 VMs running on the old cluster but then I
realized that the primary shard for each index might not be on that VM. Can
I manually set the primary shard somehow?
Oh, didn't know about optimize so I'll definitely keep that in mind.
The reason I was asking about primary shards is that I saw, when starting
from a rsync'd datafolder off of one of the nodes, double the amount of
documents. It wasn't immediatly apparent but when I later on tried with two
rsyncs matching up old node 1 with new node 1 and old node 2 with new node
2 the "duplicates" went away... and the cluster recovered significantly
faster. But reading this, it seems to be sufficient just to rsync the data
folder from any 1 node in the old cluster and things will just work? Is
there a way to verify the consistency of my cluster? Something like index
checksums, or somesuch?
On 24 October 2014 17:54, Ivan Brusic ivan@brusic.com wrote:
Unless you are moving to new hardware, there is no need to rsync your
data. Both Elasticsaerch 0.90.x and 1.3.x are based on Lucene 4, so the
underlying data is compatible. Of course, you should backup your data
before such an upgrade.
After restarting your new cluster with your old data, I would run an
optimize on your indices so that Lucene can upgrade all your segments into
the new format. There have been some issues with Lucene format
incompatibilities, but they usually deal with indices with beta Lucene
versions.
You cannot bring up a mixed cluster between 0.90 and 1.x, so you would
need to stop all your VMs. Why are you interested in primary shards?
Elasticsearch is not like most database where the primary node has an extra
special connotation. I have not played around with shard allocation much,
but here is an old article: ElasticSearch Shard Placement Control - Sematext
Ah, slight typo in regard to the old cluster. It is 1 replica per index.
On Thursday, October 23, 2014 10:13:57 PM UTC+2, Magnus Persson wrote:
So I'm about to upgrade to 1.3.4, but due to some unfortunate
circumstances I need to migrate my ES cluster to new VMs.
The environment is fairly simple. At the top I have logstash agent
pulling messages off of a Redis server and feeding it to my 2 node cluster
(2 replicas, 2 shards per index). So for what it's worth I can stop
logstash and the cluster will essentially stop indexing data, allowing me
to shut it down without issue. Once I have the old cluster shut down, I
intend to rsync it over to the new cluster which is 3 nodes (2 replicas, 3
shards per index).
What is the best approach here? I was thinking that I could rsync the
data folder from 1 of my 2 VMs running on the old cluster but then I
realized that the primary shard for each index might not be on that VM. Can
I manually set the primary shard somehow?
I shut down the old cluster while copying the files. For some reason I'm
seeing duplicate docs again with ~3.2M docs on the old cluster and ~6.3M
docs on the new cluster (using Kopf to compare). Am I missing something
obvious? At one point I think I got the document count to match up but
obviously I'm not able to reach this state again.
On Friday, October 24, 2014 11:42:27 PM UTC+2, Jörg Prante wrote:
The plan to move from a 2 node to a 3 node cluster is as follows
backup your old data files (in case you want to go back, once upgraded,
there is no way back)
shutdown old cluster
move the data file folder of the old cluster nodes to the new cluster
nodes data folders. One node gets no data folder. No rsync required.
check minimum_master_nodes = 2. This is essential for 3 nodes.
start up cluster, all nodes. See the shards rebalancing. No need to
worry about primary shards.
Jörg
On Fri, Oct 24, 2014 at 8:03 PM, Magnus Persson <magnus.e...@gmail.com
<javascript:>> wrote:
Oh, didn't know about optimize so I'll definitely keep that in mind.
The reason I was asking about primary shards is that I saw, when starting
from a rsync'd datafolder off of one of the nodes, double the amount of
documents. It wasn't immediatly apparent but when I later on tried with two
rsyncs matching up old node 1 with new node 1 and old node 2 with new node
2 the "duplicates" went away... and the cluster recovered significantly
faster. But reading this, it seems to be sufficient just to rsync the data
folder from any 1 node in the old cluster and things will just work? Is
there a way to verify the consistency of my cluster? Something like index
checksums, or somesuch?
On 24 October 2014 17:54, Ivan Brusic <iv...@brusic.com <javascript:>>
wrote:
Unless you are moving to new hardware, there is no need to rsync your
data. Both Elasticsaerch 0.90.x and 1.3.x are based on Lucene 4, so the
underlying data is compatible. Of course, you should backup your data
before such an upgrade.
After restarting your new cluster with your old data, I would run an
optimize on your indices so that Lucene can upgrade all your segments into
the new format. There have been some issues with Lucene format
incompatibilities, but they usually deal with indices with beta Lucene
versions.
You cannot bring up a mixed cluster between 0.90 and 1.x, so you would
need to stop all your VMs. Why are you interested in primary shards?
Elasticsearch is not like most database where the primary node has an extra
special connotation. I have not played around with shard allocation much,
but here is an old article: ElasticSearch Shard Placement Control - Sematext
Cheers,
Ivan
On Thu, Oct 23, 2014 at 4:18 PM, Magnus Persson <magnus.e...@gmail.com
<javascript:>> wrote:
Ah, slight typo in regard to the old cluster. It is 1 replica per index.
On Thursday, October 23, 2014 10:13:57 PM UTC+2, Magnus Persson wrote:
So I'm about to upgrade to 1.3.4, but due to some unfortunate
circumstances I need to migrate my ES cluster to new VMs.
The environment is fairly simple. At the top I have logstash agent
pulling messages off of a Redis server and feeding it to my 2 node cluster
(2 replicas, 2 shards per index). So for what it's worth I can stop
logstash and the cluster will essentially stop indexing data, allowing me
to shut it down without issue. Once I have the old cluster shut down, I
intend to rsync it over to the new cluster which is 3 nodes (2 replicas, 3
shards per index).
What is the best approach here? I was thinking that I could rsync the
data folder from 1 of my 2 VMs running on the old cluster but then I
realized that the primary shard for each index might not be on that VM. Can
I manually set the primary shard somehow?
https://gist.github.com/magnusp/515a5c3debed12802d1f is the configuration
im running on the new cluster. The old cluster is the default that came
with 0.90.3 (replicas and shards were set via templates I guess)
On Monday, October 27, 2014 12:37:48 PM UTC+1, Magnus Persson wrote:
This is very strange.
I shut down the old cluster while copying the files. For some reason I'm
seeing duplicate docs again with ~3.2M docs on the old cluster and ~6.3M
docs on the new cluster (using Kopf to compare). Am I missing something
obvious? At one point I think I got the document count to match up but
obviously I'm not able to reach this state again.
On Friday, October 24, 2014 11:42:27 PM UTC+2, Jörg Prante wrote:
The plan to move from a 2 node to a 3 node cluster is as follows
backup your old data files (in case you want to go back, once upgraded,
there is no way back)
shutdown old cluster
move the data file folder of the old cluster nodes to the new cluster
nodes data folders. One node gets no data folder. No rsync required.
check minimum_master_nodes = 2. This is essential for 3 nodes.
start up cluster, all nodes. See the shards rebalancing. No need to
worry about primary shards.
Oh, didn't know about optimize so I'll definitely keep that in mind.
The reason I was asking about primary shards is that I saw, when
starting from a rsync'd datafolder off of one of the nodes, double the
amount of documents. It wasn't immediatly apparent but when I later on
tried with two rsyncs matching up old node 1 with new node 1 and old node 2
with new node 2 the "duplicates" went away... and the cluster recovered
significantly faster. But reading this, it seems to be sufficient just to
rsync the data folder from any 1 node in the old cluster and things will
just work? Is there a way to verify the consistency of my cluster?
Something like index checksums, or somesuch?
Unless you are moving to new hardware, there is no need to rsync your
data. Both Elasticsaerch 0.90.x and 1.3.x are based on Lucene 4, so the
underlying data is compatible. Of course, you should backup your data
before such an upgrade.
After restarting your new cluster with your old data, I would run an
optimize on your indices so that Lucene can upgrade all your segments into
the new format. There have been some issues with Lucene format
incompatibilities, but they usually deal with indices with beta Lucene
versions.
You cannot bring up a mixed cluster between 0.90 and 1.x, so you would
need to stop all your VMs. Why are you interested in primary shards?
Elasticsearch is not like most database where the primary node has an extra
special connotation. I have not played around with shard allocation much,
but here is an old article: ElasticSearch Shard Placement Control - Sematext
Ah, slight typo in regard to the old cluster. It is 1 replica per
index.
On Thursday, October 23, 2014 10:13:57 PM UTC+2, Magnus Persson wrote:
So I'm about to upgrade to 1.3.4, but due to some unfortunate
circumstances I need to migrate my ES cluster to new VMs.
The environment is fairly simple. At the top I have logstash agent
pulling messages off of a Redis server and feeding it to my 2 node cluster
(2 replicas, 2 shards per index). So for what it's worth I can stop
logstash and the cluster will essentially stop indexing data, allowing me
to shut it down without issue. Once I have the old cluster shut down, I
intend to rsync it over to the new cluster which is 3 nodes (2 replicas, 3
shards per index).
What is the best approach here? I was thinking that I could rsync the
data folder from 1 of my 2 VMs running on the old cluster but then I
realized that the primary shard for each index might not be on that VM. Can
I manually set the primary shard somehow?
When using the count API the document count seems to more reasonably match
up. Might possibly be that Kopf is counting documents differently on 0.90
than on 1.3.. seems far fetched though.
On Monday, October 27, 2014 1:16:40 PM UTC+1, Magnus Persson wrote:
https://gist.github.com/magnusp/515a5c3debed12802d1f is the configuration
im running on the new cluster. The old cluster is the default that came
with 0.90.3 (replicas and shards were set via templates I guess)
On Monday, October 27, 2014 12:37:48 PM UTC+1, Magnus Persson wrote:
This is very strange.
I shut down the old cluster while copying the files. For some reason I'm
seeing duplicate docs again with ~3.2M docs on the old cluster and ~6.3M
docs on the new cluster (using Kopf to compare). Am I missing something
obvious? At one point I think I got the document count to match up but
obviously I'm not able to reach this state again.
On Friday, October 24, 2014 11:42:27 PM UTC+2, Jörg Prante wrote:
The plan to move from a 2 node to a 3 node cluster is as follows
backup your old data files (in case you want to go back, once
upgraded, there is no way back)
shutdown old cluster
move the data file folder of the old cluster nodes to the new cluster
nodes data folders. One node gets no data folder. No rsync required.
check minimum_master_nodes = 2. This is essential for 3 nodes.
start up cluster, all nodes. See the shards rebalancing. No need to
worry about primary shards.
Oh, didn't know about optimize so I'll definitely keep that in mind.
The reason I was asking about primary shards is that I saw, when
starting from a rsync'd datafolder off of one of the nodes, double the
amount of documents. It wasn't immediatly apparent but when I later on
tried with two rsyncs matching up old node 1 with new node 1 and old node 2
with new node 2 the "duplicates" went away... and the cluster recovered
significantly faster. But reading this, it seems to be sufficient just to
rsync the data folder from any 1 node in the old cluster and things will
just work? Is there a way to verify the consistency of my cluster?
Something like index checksums, or somesuch?
Unless you are moving to new hardware, there is no need to rsync your
data. Both Elasticsaerch 0.90.x and 1.3.x are based on Lucene 4, so the
underlying data is compatible. Of course, you should backup your data
before such an upgrade.
After restarting your new cluster with your old data, I would run an
optimize on your indices so that Lucene can upgrade all your segments into
the new format. There have been some issues with Lucene format
incompatibilities, but they usually deal with indices with beta Lucene
versions.
You cannot bring up a mixed cluster between 0.90 and 1.x, so you would
need to stop all your VMs. Why are you interested in primary shards?
Elasticsearch is not like most database where the primary node has an extra
special connotation. I have not played around with shard allocation much,
but here is an old article: ElasticSearch Shard Placement Control - Sematext
Ah, slight typo in regard to the old cluster. It is 1 replica per
index.
On Thursday, October 23, 2014 10:13:57 PM UTC+2, Magnus Persson wrote:
So I'm about to upgrade to 1.3.4, but due to some unfortunate
circumstances I need to migrate my ES cluster to new VMs.
The environment is fairly simple. At the top I have logstash agent
pulling messages off of a Redis server and feeding it to my 2 node cluster
(2 replicas, 2 shards per index). So for what it's worth I can stop
logstash and the cluster will essentially stop indexing data, allowing me
to shut it down without issue. Once I have the old cluster shut down, I
intend to rsync it over to the new cluster which is 3 nodes (2 replicas, 3
shards per index).
What is the best approach here? I was thinking that I could rsync
the data folder from 1 of my 2 VMs running on the old cluster but then I
realized that the primary shard for each index might not be on that VM. Can
I manually set the primary shard somehow?
On Monday, October 27, 2014 3:21:24 PM UTC+1, Magnus Persson wrote:
When using the count API the document count seems to more reasonably match
up. Might possibly be that Kopf is counting documents differently on 0.90
than on 1.3.. seems far fetched though.
On Monday, October 27, 2014 1:16:40 PM UTC+1, Magnus Persson wrote:
https://gist.github.com/magnusp/515a5c3debed12802d1f is the
configuration im running on the new cluster. The old cluster is the default
that came with 0.90.3 (replicas and shards were set via templates I guess)
On Monday, October 27, 2014 12:37:48 PM UTC+1, Magnus Persson wrote:
This is very strange.
I shut down the old cluster while copying the files. For some reason I'm
seeing duplicate docs again with ~3.2M docs on the old cluster and ~6.3M
docs on the new cluster (using Kopf to compare). Am I missing something
obvious? At one point I think I got the document count to match up but
obviously I'm not able to reach this state again.
On Friday, October 24, 2014 11:42:27 PM UTC+2, Jörg Prante wrote:
The plan to move from a 2 node to a 3 node cluster is as follows
backup your old data files (in case you want to go back, once
upgraded, there is no way back)
shutdown old cluster
move the data file folder of the old cluster nodes to the new cluster
nodes data folders. One node gets no data folder. No rsync required.
check minimum_master_nodes = 2. This is essential for 3 nodes.
start up cluster, all nodes. See the shards rebalancing. No need to
worry about primary shards.
Oh, didn't know about optimize so I'll definitely keep that in mind.
The reason I was asking about primary shards is that I saw, when
starting from a rsync'd datafolder off of one of the nodes, double the
amount of documents. It wasn't immediatly apparent but when I later on
tried with two rsyncs matching up old node 1 with new node 1 and old node 2
with new node 2 the "duplicates" went away... and the cluster recovered
significantly faster. But reading this, it seems to be sufficient just to
rsync the data folder from any 1 node in the old cluster and things will
just work? Is there a way to verify the consistency of my cluster?
Something like index checksums, or somesuch?
Unless you are moving to new hardware, there is no need to rsync your
data. Both Elasticsaerch 0.90.x and 1.3.x are based on Lucene 4, so the
underlying data is compatible. Of course, you should backup your data
before such an upgrade.
After restarting your new cluster with your old data, I would run an
optimize on your indices so that Lucene can upgrade all your segments into
the new format. There have been some issues with Lucene format
incompatibilities, but they usually deal with indices with beta Lucene
versions.
You cannot bring up a mixed cluster between 0.90 and 1.x, so you
would need to stop all your VMs. Why are you interested in primary shards?
Elasticsearch is not like most database where the primary node has an extra
special connotation. I have not played around with shard allocation much,
but here is an old article: ElasticSearch Shard Placement Control - Sematext
Ah, slight typo in regard to the old cluster. It is 1 replica per
index.
On Thursday, October 23, 2014 10:13:57 PM UTC+2, Magnus Persson
wrote:
So I'm about to upgrade to 1.3.4, but due to some unfortunate
circumstances I need to migrate my ES cluster to new VMs.
The environment is fairly simple. At the top I have logstash agent
pulling messages off of a Redis server and feeding it to my 2 node cluster
(2 replicas, 2 shards per index). So for what it's worth I can stop
logstash and the cluster will essentially stop indexing data, allowing me
to shut it down without issue. Once I have the old cluster shut down, I
intend to rsync it over to the new cluster which is 3 nodes (2 replicas, 3
shards per index).
What is the best approach here? I was thinking that I could rsync
the data folder from 1 of my 2 VMs running on the old cluster but then I
realized that the primary shard for each index might not be on that VM. Can
I manually set the primary shard somehow?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.