Single instance cluster borked after restart. Please help me understand what happened

Steve_Johnson_3 · February 5, 2015, 9:40pm

I have a single ES AWS instance that I've been logging to as part of a
standard ELK stack. It has been running for about 4 weeks. I had
replication disabled.

Today, I decided to stop and start the instance so that I could increase
the memory size of the it. When it came back up, only about half of the 5
shards for each index were assigned...in some cases two, and in some cases
3.

After fooling around a bunch, I looked on the disk, where I found that each
of my indexes were stored within two high-level directories,
'/data1/elasticsearch/es-vpc3/nodes/0' and
/data1/elasticsearch/es-vpc3/nodes/1'. It is the shards stored in the '0'
directory that were being assigned. The shards stored in '1' were not. This
indicated to me that I must have been running two ES instances on the one
instance without knowing it. So I figured, 'what the heck', and I started a
second copy of ES. Sure enough, my other shards were assigned to the second
instance!

Here's the problem though. I've been using the HEAD plugin to view my
cluster. Prior to the reboot, the display represented the cluster as a
single ES instance with all of the shards being shown together in a single
row. Now, I get two rows, one for each instance of ES, with the appropriate
nodes for what I saw in the directory structure shown in each of the two
rows. So something is clearly different than it was before. It appears that
I was not running two distinct instances of ES. So what was I doing? Why
did my indexes get split across two "nodes/N" directories and why upon
reboot did only the "nodes/0" shards get assigned?

Can someone tell me what is going on here....what was different about my
setup before and after the reboot? Surely just giving the machine more
memory couldn't have caused this, right?

TIA for any enlightenment.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3aac59f7-30b9-40fc-af61-6de6c0764d6e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

dadoonet · February 5, 2015, 9:58pm

When shards are evenly distributed, they won't move again.
What you should do is to start the two nodes, set replica to 1, the kill node 2 and set replica to 0.

Or you use the cluster reroute API: Elasticsearch Platform — Find real-time answers at scale | Elastic

David

Le 5 févr. 2015 à 22:40, Steve Johnson steve@filethis.com a écrit :

I have a single ES AWS instance that I've been logging to as part of a standard ELK stack. It has been running for about 4 weeks. I had replication disabled.

Today, I decided to stop and start the instance so that I could increase the memory size of the it. When it came back up, only about half of the 5 shards for each index were assigned...in some cases two, and in some cases 3.

After fooling around a bunch, I looked on the disk, where I found that each of my indexes were stored within two high-level directories, '/data1/elasticsearch/es-vpc3/nodes/0' and /data1/elasticsearch/es-vpc3/nodes/1'. It is the shards stored in the '0' directory that were being assigned. The shards stored in '1' were not. This indicated to me that I must have been running two ES instances on the one instance without knowing it. So I figured, 'what the heck', and I started a second copy of ES. Sure enough, my other shards were assigned to the second instance!

Here's the problem though. I've been using the HEAD plugin to view my cluster. Prior to the reboot, the display represented the cluster as a single ES instance with all of the shards being shown together in a single row. Now, I get two rows, one for each instance of ES, with the appropriate nodes for what I saw in the directory structure shown in each of the two rows. So something is clearly different than it was before. It appears that I was not running two distinct instances of ES. So what was I doing? Why did my indexes get split across two "nodes/N" directories and why upon reboot did only the "nodes/0" shards get assigned?

Can someone tell me what is going on here....what was different about my setup before and after the reboot? Surely just giving the machine more memory couldn't have caused this, right?

TIA for any enlightenment.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3aac59f7-30b9-40fc-af61-6de6c0764d6e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E47DA4B9-AB6B-4BDD-9912-95496A928096%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.

Steve_Johnson_3 · February 6, 2015, 12:00am

Thanks David for your response.

I don’t understand what you’re saying, or maybe you don’t understand what happened. Before the restart, my nodes weren’t distributed at all. All 5 shards for each index were one one node, and replication was set to 0. I restarted the instance, and all of that was still true, but half of the nodes got assigned back to the one and only node and the other half remained unassigned. I’d like to understand why a reboot led to a different state, a state that left the system unusable.

I played with the reroute API a bit. That doesn’t help because I lose the shard data if I force the unassigned shards back onto the one node…they end up empty if I do that.

Maybe you’re only telling me how to recover given that creating a second node on the same instance has gotten all my data back online. If so, thanks for that. I’ll give what you’re saying a try to see if that helps. My bigger concern is why I have to go through this, mainly because I’m worried it will be required every time I restart this instance. I need to understand the issue here.

Steve

On Feb 5, 2015, at 1:58 PM, David Pilato david@pilato.fr wrote:

When shards are evenly distributed, they won't move again.
What you should do is to start the two nodes, set replica to 1, the kill node 2 and set replica to 0.

Or you use the cluster reroute API: Elasticsearch Platform — Find real-time answers at scale | Elastic http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-reroute.html

David

Le 5 févr. 2015 à 22:40, Steve Johnson <steve@filethis.com mailto:steve@filethis.com> a écrit :

I have a single ES AWS instance that I've been logging to as part of a standard ELK stack. It has been running for about 4 weeks. I had replication disabled.

Today, I decided to stop and start the instance so that I could increase the memory size of the it. When it came back up, only about half of the 5 shards for each index were assigned...in some cases two, and in some cases 3.

After fooling around a bunch, I looked on the disk, where I found that each of my indexes were stored within two high-level directories, '/data1/elasticsearch/es-vpc3/nodes/0' and /data1/elasticsearch/es-vpc3/nodes/1'. It is the shards stored in the '0' directory that were being assigned. The shards stored in '1' were not. This indicated to me that I must have been running two ES instances on the one instance without knowing it. So I figured, 'what the heck', and I started a second copy of ES. Sure enough, my other shards were assigned to the second instance!

Here's the problem though. I've been using the HEAD plugin to view my cluster. Prior to the reboot, the display represented the cluster as a single ES instance with all of the shards being shown together in a single row. Now, I get two rows, one for each instance of ES, with the appropriate nodes for what I saw in the directory structure shown in each of the two rows. So something is clearly different than it was before. It appears that I was not running two distinct instances of ES. So what was I doing? Why did my indexes get split across two "nodes/N" directories and why upon reboot did only the "nodes/0" shards get assigned?

Can someone tell me what is going on here....what was different about my setup before and after the reboot? Surely just giving the machine more memory couldn't have caused this, right?

TIA for any enlightenment.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3aac59f7-30b9-40fc-af61-6de6c0764d6e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3aac59f7-30b9-40fc-af61-6de6c0764d6e%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/SpY5A2uZq1c/unsubscribe https://groups.google.com/d/topic/elasticsearch/SpY5A2uZq1c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E47DA4B9-AB6B-4BDD-9912-95496A928096%40pilato.fr https://groups.google.com/d/msgid/elasticsearch/E47DA4B9-AB6B-4BDD-9912-95496A928096%40pilato.fr?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/235BBB88-EC0D-4A5C-94B1-5B746A09D4E3%40filethis.com.
For more options, visit https://groups.google.com/d/optout.

Steve_Johnson_3 · February 6, 2015, 12:07am

I don’t know if this helps at all, but the latest index that got created for the new day, with me running two nodes now, all got assigned to one of the two nodes.

Steve

On Feb 5, 2015, at 1:58 PM, David Pilato david@pilato.fr wrote:

When shards are evenly distributed, they won't move again.
What you should do is to start the two nodes, set replica to 1, the kill node 2 and set replica to 0.

Or you use the cluster reroute API: Elasticsearch Platform — Find real-time answers at scale | Elastic http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-reroute.html

David

Le 5 févr. 2015 à 22:40, Steve Johnson <steve@filethis.com mailto:steve@filethis.com> a écrit :

I have a single ES AWS instance that I've been logging to as part of a standard ELK stack. It has been running for about 4 weeks. I had replication disabled.

Today, I decided to stop and start the instance so that I could increase the memory size of the it. When it came back up, only about half of the 5 shards for each index were assigned...in some cases two, and in some cases 3.

After fooling around a bunch, I looked on the disk, where I found that each of my indexes were stored within two high-level directories, '/data1/elasticsearch/es-vpc3/nodes/0' and /data1/elasticsearch/es-vpc3/nodes/1'. It is the shards stored in the '0' directory that were being assigned. The shards stored in '1' were not. This indicated to me that I must have been running two ES instances on the one instance without knowing it. So I figured, 'what the heck', and I started a second copy of ES. Sure enough, my other shards were assigned to the second instance!

Here's the problem though. I've been using the HEAD plugin to view my cluster. Prior to the reboot, the display represented the cluster as a single ES instance with all of the shards being shown together in a single row. Now, I get two rows, one for each instance of ES, with the appropriate nodes for what I saw in the directory structure shown in each of the two rows. So something is clearly different than it was before. It appears that I was not running two distinct instances of ES. So what was I doing? Why did my indexes get split across two "nodes/N" directories and why upon reboot did only the "nodes/0" shards get assigned?

Can someone tell me what is going on here....what was different about my setup before and after the reboot? Surely just giving the machine more memory couldn't have caused this, right?

TIA for any enlightenment.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3aac59f7-30b9-40fc-af61-6de6c0764d6e%40googlegroups.com https://groups.google.com/d/msgid/elasticsearch/3aac59f7-30b9-40fc-af61-6de6c0764d6e%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/SpY5A2uZq1c/unsubscribe https://groups.google.com/d/topic/elasticsearch/SpY5A2uZq1c/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/E47DA4B9-AB6B-4BDD-9912-95496A928096%40pilato.fr https://groups.google.com/d/msgid/elasticsearch/E47DA4B9-AB6B-4BDD-9912-95496A928096%40pilato.fr?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/38C88038-410F-4014-83FF-0AFBA8873700%40filethis.com.
For more options, visit https://groups.google.com/d/optout.

Steve_Johnson_3 · February 6, 2015, 12:53am

Thanks David for your response.

I don’t understand what you’re saying, or maybe you don’t understand what
happened. Before the restart, my nodes weren’t distributed at all. All 5
shards for each index were one one node, and replication was set to 0. I
restarted the instance, and all of that was still true, but half of the
nodes got assigned back to the one and only node and the other half
remained unassigned. I’d like to understand why a reboot led to a
different state, a state that left the system unusable.

I played with the reroute API a bit. That doesn’t help because I lose the
shard data if I force the unassigned shards back onto the one node…they end
up empty if I do that.

Maybe you’re only telling me how to recover given that creating a second
node on the same instance has gotten all my data back online. If so,
thanks for that. I’ll give what you’re saying a try to see if that helps.
My bigger concern is why I have to go through this, mainly because I’m
worried it will be required every time I restart this instance. I need to
understand the issue here.

I don’t know if this helps at all, but the latest index that got created
for the new day, with me running two nodes now, all got assigned to one of
the two nodes.

Steve

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/37e7d26b-d2da-4d3b-bef9-4f65a1a33fe0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Cluster questions Elasticsearch	7	376	July 6, 2017
Running more than one ES instance on the same server while indexing data Elasticsearch	4	761	July 6, 2017
Disappearing Shards Elasticsearch	10	414	July 6, 2017
Shards stuck on initialising Elasticsearch	8	1300	July 5, 2017
Restarting one of the nodes resulted in unassigned shards Elasticsearch	4	2660	July 6, 2017

Single instance cluster borked after restart. Please help me understand what happened

Related topics