Node will not shut down


(Brad Jordan-2) #1

I have a 4 node ES cluster. I am adding new docs at a rate of about 3k/sec.
After an hour my cluster status turns to yellow. I see that node 2 is no
longer part of the cluster when I curl the "_cluster/state" endpoint.
Hitting the "_cluster/health" endpoint reveals that I have 872 unassigned
shards.

I login to node 2 and find ES running. I want to restart ES on node 2 to
see if it will rejoin my cluster. I issue the command: "curl -XPOST
'http://localhost:9200/_cluster/nodes/_local/_shutdown' " and get
back curl: (56) Failure when receiving data from the peer. I try this
several more times and ES will still not shut down on node 2. I believe at
this point if I kill -9 the process the cluster will move from "yellow" to
"red". How do I gracefully recover from this situation? Is it possible?

Thanks,
Brad

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3e8c18e8-6f7c-48b7-a7a4-ed0c2a072871%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Ben Hundley) #2

2 questions:

  1. What size servers are you using? Knowing how much RAM and # cores would be very helpful.

  2. Definitely sounds like a massive load. Are you going to continually be inserting 3k docs per sec? ~260mil documents a day?


(Brad Jordan-2) #3

This is a DEV env. I've got 24G of RAM on all 4 machines. 12G for ES and
12G for the OS. I believe the machines are quad core HP Z-800's.

I will not be inserting at this rate very often. My question is more
operational. How do you recover from the place I am in? If I kill -9 the ES
process on node 2 I believe I will put my cluster in the red state.

I did get into this unhappy spot once before. After trying to shut down ES
on node 2 I eventually kill -9'd it. At that point my cluster was in the
red state and unable to service requests. The "unassigned_shards" number
was not changing. I have daily indexes so I simply deleted the most recent
daily index and rebuilt it. At this point my cluster had all 4 nodes and
was green again. In production this approach is not popular with mgmt. so
I'm trying to understand a less heavy handed approach :wink:

-Brad

On Tuesday, January 21, 2014 2:54:52 PM UTC-7, Ben Hundley wrote:

2 questions:

  1. What size servers are you using? Knowing how much RAM and # cores
    would
    be very helpful.

  2. Definitely sounds like a massive load. Are you going to continually be
    inserting 3k docs per sec? ~260mil documents a day?


--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Node-will-not-shut-down-tp4047940p4047942.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3c667258-6c16-4557-8816-bd98ef0069b5%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brad Jordan-2) #4

Just an update... I waited for the "unassigned_shards" number to reach zero
at which point the cluster_state reported GREEN but still only had 3 nodes.
I was then able to execute the curl to shut down the node 2 and then
restart it. It joined the cluster again and everyone was happy. I guess the
moral of the story is to just wait and ES will fix itself? Patience is a
virtue? Not sure but ES did eventually fix itself :slight_smile:

-Brad

On Tuesday, January 21, 2014 3:42:51 PM UTC-7, Brad Jordan wrote:

This is a DEV env. I've got 24G of RAM on all 4 machines. 12G for ES and
12G for the OS. I believe the machines are quad core HP Z-800's.

I will not be inserting at this rate very often. My question is more
operational. How do you recover from the place I am in? If I kill -9 the ES
process on node 2 I believe I will put my cluster in the red state.

I did get into this unhappy spot once before. After trying to shut down ES
on node 2 I eventually kill -9'd it. At that point my cluster was in the
red state and unable to service requests. The "unassigned_shards" number
was not changing. I have daily indexes so I simply deleted the most recent
daily index and rebuilt it. At this point my cluster had all 4 nodes and
was green again. In production this approach is not popular with mgmt. so
I'm trying to understand a less heavy handed approach :wink:

-Brad

On Tuesday, January 21, 2014 2:54:52 PM UTC-7, Ben Hundley wrote:

2 questions:

  1. What size servers are you using? Knowing how much RAM and # cores
    would
    be very helpful.

  2. Definitely sounds like a massive load. Are you going to continually
    be
    inserting 3k docs per sec? ~260mil documents a day?


--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Node-will-not-shut-down-tp4047940p4047942.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a13cec60-5329-46f1-8228-03ebad5f5cee%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Norberto Meijome) #5

Brad,
What version of ES?
What was ES doing on the problem node? (Hot threads call/ log file/
strace). Any related OS info( was it io bound? )
If it was really hung, I am not sure why the shutdown would work after
moving the shards off it ( I.e. cluster was green...) ...it sounds to me
like it was too busy doing something...

The other nodes had already decided your problem node was out of the
cluster , right?
On 22/01/2014 9:55 AM, "Brad Jordan" climberbrad@gmail.com wrote:

Just an update... I waited for the "unassigned_shards" number to reach
zero at which point the cluster_state reported GREEN but still only had 3
nodes. I was then able to execute the curl to shut down the node 2 and then
restart it. It joined the cluster again and everyone was happy. I guess the
moral of the story is to just wait and ES will fix itself? Patience is a
virtue? Not sure but ES did eventually fix itself :slight_smile:

-Brad

On Tuesday, January 21, 2014 3:42:51 PM UTC-7, Brad Jordan wrote:

This is a DEV env. I've got 24G of RAM on all 4 machines. 12G for ES and
12G for the OS. I believe the machines are quad core HP Z-800's.

I will not be inserting at this rate very often. My question is more
operational. How do you recover from the place I am in? If I kill -9 the ES
process on node 2 I believe I will put my cluster in the red state.

I did get into this unhappy spot once before. After trying to shut down
ES on node 2 I eventually kill -9'd it. At that point my cluster was in the
red state and unable to service requests. The "unassigned_shards" number
was not changing. I have daily indexes so I simply deleted the most recent
daily index and rebuilt it. At this point my cluster had all 4 nodes and
was green again. In production this approach is not popular with mgmt. so
I'm trying to understand a less heavy handed approach :wink:

-Brad

On Tuesday, January 21, 2014 2:54:52 PM UTC-7, Ben Hundley wrote:

2 questions:

  1. What size servers are you using? Knowing how much RAM and # cores
    would
    be very helpful.

  2. Definitely sounds like a massive load. Are you going to continually
    be
    inserting 3k docs per sec? ~260mil documents a day?


--
View this message in context: http://elasticsearch-users.
115913.n3.nabble.com/Node-will-not-shut-down-tp4047940p4047942.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a13cec60-5329-46f1-8228-03ebad5f5cee%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACj2-4JgZpGfAy%2BgFYMTsAGG8sjMPSWn1%3D4059QrjqHU6KvXPg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6