What's happend? my es cluster? plz help me

happyprg · April 20, 2014, 7:37am

my cluster is consist of 3 instance ip name 15~17
today in the morning. 17 instance was left the cluster
in the 15 instance elasticsearch-head plugin 17 instance stats is "Unassigned" 16 is can not find.
what's happend?
please somebody help me

17 instance log message.. in below..

[2014-04-20 03:29:28,539][INFO ][discovery.zen ] [10.32.240.17] master_left [[10.32.240.16] [YL2_5dVaTQ-_3Rvm1yKzoA] [net [/10.32.240.16:21001]]], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
[2014-04-20 03:29:28,540][INFO ][cluster.service ] [10.32.240.17] master {new [10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]], previous [10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]}, removed {[10.32.240.16][Y
L2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]],}, reason: zen-disco-master_failed ([10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]])
[2014-04-20 03:30:01,320][DEBUG][action.admin.cluster.node.stats] [10.32.240.17] failed to execute on node [a0qNnjLvQSauGEddNxKmNw]
org.elasticsearch.index.engine.EngineClosedException: [jp_listened_calcu_log][0] CurrentState[CLOSED]

1. instance log message
  [2014-04-20 03:27:18,747][INFO ][discovery.zen ] [10.32.240.15] master_left [[10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]], reason [failed to ping, tried [3] times, each with maximum [30s] timeout]
  [2014-04-20 03:27:18,757][INFO ][cluster.service ] [10.32.240.15] master {new [10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]], previous [10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]}, removed {[10.32.240.16][Y
  L2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]],}, reason: zen-disco-master_failed ([10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]])
  [2014-04-20 03:28:28,544][WARN ][transport ] [10.32.240.15] Received response for a request that has timed out, sent [68787ms] ago, timed out [38787ms] ago, action [discovery/zen/fd/masterPing], node [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][i net[/10.32.240.17:21001]]], id [10310608]
  [2014-04-20 03:28:28,544][WARN ][transport ] [10.32.240.15] Received response for a request that has timed out, sent [38787ms] ago, timed out [8787ms] ago, action [discovery/zen/fd/masterPing], node [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][in
  et[/10.32.240.17:21001]]], id [10310609]
  [2014-04-20 03:28:28,552][INFO ][discovery.zen ] [10.32.240.15] master_left [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]], reason [no longer master]
  [2014-04-20 03:28:28,557][INFO ][cluster.service ] [10.32.240.15] master {new [10.32.240.15][dE_q8O-dT-SeUlTBuM-yiQ][inet[/10.32.240.15:21001]], previous [10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]}, removed {[10.32.240.17][a
  0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]],}, reason: zen-disco-master_failed ([10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]])
  [2014-04-20 03:29:28,546][WARN ][discovery.zen ] [10.32.240.15] received cluster state from [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] which is also master but with an older cluster_state, telling [[10.32.240.17][a0qNnjL
  vQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] to rejoin the cluster
  [2014-04-20 03:29:28,548][WARN ][discovery.zen ] [10.32.240.15] failed to send rejoin request to [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]]
  org.elasticsearch.transport.SendRequestTransportException: [10.32.240.17][inet[/10.32.240.17:21001]][discovery/zen/rejoin]
  at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
  at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
  at org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:541)
  at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:298)
  at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
  Caused by: org.elasticsearch.transport.NodeNotConnectedException: [10.32.240.17][inet[/10.32.240.17:21001]] Node not connected
  at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:834)
  at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:532)
  at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
  ... 7 more
  [2014-04-20 03:29:28,603][WARN ][discovery.zen ] [10.32.240.15] received cluster state from [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] which is also master but with an older cluster_state, telling [[10.32.240.17][a0qNnjL
  vQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] to rejoin the cluster
  [2014-04-20 03:29:28,604][WARN ][discovery.zen ] [10.32.240.15] failed to send rejoin request to [[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]]
  org.elasticsearch.transport.SendRequestTransportException: [10.32.240.17][inet[/10.32.240.17:21001]][discovery/zen/rejoin]
  at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
  at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
  at org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:541)
  at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:298)
  at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
  Caused by: org.elasticsearch.transport.NodeNotConnectedException: [10.32.240.17][inet[/10.32.240.17:21001]] Node not connected
  at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:834)
  at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:532)
  at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
  ... 7 more
  ~
17 instance elasticsearch process is alive

/usr/bin/java -Xms2G -Xmx2G -Xss256k -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Delasticsearch -Des.path.home=/home/irteam/apps/elasticsearch-0.90.7 -cp :/home/irteam/apps/elasticsearch-0.90.7/lib/elasticsearch-0.90.7.jar:/home/irteam/apps/elasticsearch-0.90.7/lib/:/home/irteam/apps/elasticsearch-0.90.7/lib/sigar/ org.elasticsearch.bootstrap.ElasticSearch

configuration
cluster.name: music-es-beta
node.name: 10.32.240.15
http.port: 21200
transport.tcp.port: 21001
multicast.enabled: false
index.number_of_shards: 3
index.number_of_replicas: 1
index.mapper.dynamic: false
action.auto_create_index: false
bootstrap.mlockall: true
discovery.zen.ping.timeout: 10s
index.cache.field.type: soft
discovery.zen.ping.unicast.hosts: ["10.32.240.15", "10.32.240.16","10.32.240.17"]
how can i consist es-cluster? for fail-over and fail-back

Binh_Ly_2 · April 21, 2014, 9:31pm

Could be something network related. From the logs, it looks like 16 dropped
out and then 17 and 15 decided that 17 is the new master. If you have not
added more data since, you can restart 16 and see if it joins back to the
cluster. Regardless, you probably want to set
discovery.zen.minimum_master_nodes: 2 for all your 3 nodes to ensure that
if a node drops out, it will not form a cluster by itself and continue to
accept requests.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/209b2d44-04dd-4c18-bec7-8b2b14b046dc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · April 21, 2014, 10:36pm

It looks like you lost connectivity between nodes, this may be due to GC.
Shutdown all your ndoes and then add this into your config

discovery.zen.minimum_master_nodes: 2. Then restart your cluster one node
at a time.

Are you using anything like ElasticHQ, kopf or marvel to monitor things?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 20 April 2014 17:37, hongsgo hongsgo@gmail.com wrote:

my cluster is consist of 3 instance ip name 15~17
today in the morning. 17 instance was left the cluster
in the 15 instance elasticsearch-head plugin 17 instance stats is
"Unassigned" 16 is can not find.
what's happend?
please somebody help me

17 instance log message.. in below..

[2014-04-20 03:29:28,539][INFO ][discovery.zen ] [10.32.240.17]
master_left [[10.32.240.16] [YL2_5dVaTQ-_3Rvm1yKzoA] [net
[/10.32.240.16:21001]]], reason [failed to ping, tried [3] times, each
with
maximum [30s] timeout]
[2014-04-20 03:29:28,540][INFO ][cluster.service ] [10.32.240.17]
master {new
[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]],
previous
[10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]},
removed
{[10.32.240.16][Y
L2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]],}, reason:
zen-disco-master_failed
([10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]])
[2014-04-20 03:30:01,320][DEBUG][action.admin.cluster.node.stats]
[10.32.240.17] failed to execute on node [a0qNnjLvQSauGEddNxKmNw]
org.elasticsearch.index.engine.EngineClosedException:
[jp_listened_calcu_log][0] CurrentState[CLOSED]

instance log message
[2014-04-20 03:27:18,747][INFO ][discovery.zen ] [10.32.240.15]
master_left
[[10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]],
reason
[failed to ping, tried [3] times, each with maximum [30s] timeout]
[2014-04-20 03:27:18,757][INFO ][cluster.service ] [10.32.240.15]
master {new
[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]],
previous
[10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]]},
removed
{[10.32.240.16][Y
L2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]],}, reason:
zen-disco-master_failed
([10.32.240.16][YL2_5dVaTQ-_3Rvm1yKzoA][inet[/10.32.240.16:21001]])
[2014-04-20 03:28:28,544][WARN ][transport ] [10.32.240.15]
Received response for a request that has timed out, sent [68787ms] ago,
timed out [38787ms] ago, action [discovery/zen/fd/masterPing], node
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][i net[/10.32.240.17:21001]]], id [10310608]
[2014-04-20 03:28:28,544][WARN ][transport ] [10.32.240.15]
Received response for a request that has timed out, sent [38787ms] ago,
timed out [8787ms] ago, action [discovery/zen/fd/masterPing], node
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][in
et[/10.32.240.17:21001]]], id [10310609]
[2014-04-20 03:28:28,552][INFO ][discovery.zen ] [10.32.240.15]
master_left
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]],
reason
[no longer master]
[2014-04-20 03:28:28,557][INFO ][cluster.service ] [10.32.240.15]
master {new
[10.32.240.15][dE_q8O-dT-SeUlTBuM-yiQ][inet[/10.32.240.15:21001]],
previous
[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]},
removed
{[10.32.240.17][a
0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]],}, reason:
zen-disco-master_failed
([10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]])
[2014-04-20 03:29:28,546][WARN ][discovery.zen ] [10.32.240.15]
received cluster state from
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] which
is
also master but with an older cluster_state, telling
[[10.32.240.17][a0qNnjL
vQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] to rejoin the cluster
[2014-04-20 03:29:28,548][WARN ][discovery.zen ] [10.32.240.15]
failed to send rejoin request to
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]]
org.elasticsearch.transport.SendRequestTransportException:
[10.32.240.17][inet[/10.32.240.17:21001]][discovery/zen/rejoin]
at

org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
at

org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
at

org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:541)
at

org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:298)
at

org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135)
at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[10.32.240.17][inet[/10.32.240.17:21001]] Node not connected
at

org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:834)
at

org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:532)
at

org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
... 7 more
[2014-04-20 03:29:28,603][WARN ][discovery.zen ] [10.32.240.15]
received cluster state from
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] which
is
also master but with an older cluster_state, telling
[[10.32.240.17][a0qNnjL
vQSauGEddNxKmNw][inet[/10.32.240.17:21001]]] to rejoin the cluster
[2014-04-20 03:29:28,604][WARN ][discovery.zen ] [10.32.240.15]
failed to send rejoin request to
[[10.32.240.17][a0qNnjLvQSauGEddNxKmNw][inet[/10.32.240.17:21001]]]
org.elasticsearch.transport.SendRequestTransportException:
[10.32.240.17][inet[/10.32.240.17:21001]][discovery/zen/rejoin]
at

org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
at

org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
at

org.elasticsearch.discovery.zen.ZenDiscovery$7.execute(ZenDiscovery.java:541)
at

org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:298)
at

org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135)
at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.elasticsearch.transport.NodeNotConnectedException:
[10.32.240.17][inet[/10.32.240.17:21001]] Node not connected
at

org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:834)
at

org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:532)
at

org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
... 7 more
~

17 instance elasticsearch process is alive

/usr/bin/java -Xms2G -Xmx2G -Xss256k -Djava.awt.headless=true
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
-Des.path.home=/home/irteam/apps/elasticsearch-0.90.7 -cp

:/home/irteam/apps/elasticsearch-0.90.7/lib/elasticsearch-0.90.7.jar:/home/irteam/apps/elasticsearch-0.90.7/lib/:/home/irteam/apps/elasticsearch-0.90.7/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch

configuration
cluster.name: music-es-beta
node.name: 10.32.240.15
http.port: 21200
transport.tcp.port: 21001
multicast.enabled: false
index.number_of_shards: 3
index.number_of_replicas: 1
index.mapper.dynamic: false
action.auto_create_index: false
bootstrap.mlockall: true
discovery.zen.ping.timeout: 10s
index.cache.field.type: soft
discovery.zen.ping.unicast.hosts: ["10.32.240.15",
"10.32.240.16","10.32.240.17"]

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/what-s-happend-my-es-es-cluster-plz-help-me-tp4054448.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1397979426164-4054448.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bXCnTTp_us3NPeeixWg2Un95%3DZCyQ%2BJ1oUziYLuiqvbA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

happyprg · April 28, 2014, 7:40am

thank you very much.

i have new questions.

first what is default value of discovery.zen.minimum_master_nodes?

second, about isolated 16 node.
client is reuqest to save at "abc index" of 16 node(not joined to cluster yet)
succeed to save.

after restart 16 node. then joined cluster.
abc index data is ok?
if not duplicate doc id. it would be ok?
after mix with 16, 17, 18 nodes

Ivan · April 29, 2014, 5:11am

There is no default value for minimum_master_nodes. If not set, the value
is not used to determine if the cluster is whole.

If the documents do not have a duplicate, they should be merged when the
node rejoins the cluster. If you set the minimum_master_nodes, the cluster
will not accept any document inserts if the cluster is red. The cluster
will be red if only one node is present (in order to prevent split brain).

Cheers,

Ivan

On Mon, Apr 28, 2014 at 12:40 AM, hongsgo hongsgo@gmail.com wrote:

thanks you very much.

i have new questions.

first what is default value of discovery.zen.minimum_master_nodes?

second, about isolated 16 node.
client is reuqest to save at "abc index" of 16 node(not joined to
cluster
yet)
succeed to save.

after restart 16 node. then joined cluster.
abc index data is ok?
if not duplicate doc id. it would be ok?
after mix with 16, 17, 18 nodes

--
View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/what-s-happend-my-es-cluster-plz-help-me-tp4054448p4054890.html
Sent from the Elasticsearch Users mailing list archive at Nabble.com.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1398670831625-4054890.post%40n3.nabble.com
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQA4L%3DCiXNKQn6d_dQjZGoHc5JYSVaBkSojZfFPhZw7xNg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
ES has a lot of unassigned_shards, failed to process cluster event, stuck in 503 error Elasticsearch	4	846	July 5, 2017
ES disconected while running on EC2 Elasticsearch	1	301	July 6, 2017
I have issues with my cluster and I want to re-build it. Need advices! Elasticsearch	5	442	October 4, 2019
Node is disconnected from cluster and does not join existing cluster (ES 7.16.2) Elasticsearch	2	678	January 1, 2023
Elasticsearch, cluster health: yellow Elasticsearch	4	745	July 5, 2017

What's happend? my es cluster? plz help me

Related topics