Logging-and-metrics stuck

jmtrc · May 24, 2018, 4:07pm

The cluster looks healthy:
{
"cluster_name" : "05274b09effb4f9f9d237d0f92204ce9",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 7,
"number_of_data_nodes" : 7,
"active_primary_shards" : 63,
"active_shards" : 70,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

But when I was trying to increase size from 1GB to 2GB, it seems stuck with messages like these:

[2018-05-24 16:04:43,380][INFO ][no.found.cluster.plan.elasticsearch.05274b09effb4f9f9d237d0f92204ce9] Timeout waiting for cluster states to update, waiting a little before rechecking... {"cluster_id":"05274b09effb4f9f9d237d0f92204ce9"}
[2018-05-24 16:04:53,497][INFO ][no.found.cluster.plan.elasticsearch.05274b09effb4f9f9d237d0f92204ce9] Setting settings: [HttpRequest(PUT,/_cluster/settings,List(),HttpEntity(application/json; charset=UTF-8,{"transient":{"cluster":{"routing":{"allocation":{"exclude":{"_name":"instance-0000000001,1527177893494"},"awareness":{"attributes":""}}}}}}),HTTP/1.1)] {"cluster_id":"05274b09effb4f9f9d237d0f92204ce9"}
[2018-05-24 16:04:53,655][INFO ][no.found.cluster.plan.elasticsearch.05274b09effb4f9f9d237d0f92204ce9] Found bad verifications: [List((ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000000),Missing(ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000000)),no.found.curator.pimps.FutureWatchedEvent@6df3d48e))] {"cluster_id":"05274b09effb4f9f9d237d0f92204ce9"}
[2018-05-24 16:04:53,655][INFO ][no.found.cluster.plan.elasticsearch.05274b09effb4f9f9d237d0f92204ce9] Verified migration with result: [List((ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000006),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@1e62aa77), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000007),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@1b323d68), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000004),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@75722552), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000005),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@826be57), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000002),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@31c8c36), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000003),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@94f3840), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000000),Missing(ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000000)),no.found.curator.pimps.FutureWatchedEvent@6df3d48e), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000001),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@509a1b61))] {"cluster_id":"05274b09effb4f9f9d237d0f92204ce9"}
[2018-05-24 16:04:53,494][INFO ][no.found.cluster.plan.elasticsearch.05274b09effb4f9f9d237d0f92204ce9] Need to set exclusions and awareness attributes. Current values are [Set(instance-0000000001, 1527177873195)] and [Set()] {"cluster_id":"05274b09effb4f9f9d237d0f92204ce9"}

I was watching /_cat/shards for anything not STARTED but all shards were reported STARTED. Does this mean anything?

Thanks.

Jin.

jmtrc · May 24, 2018, 6:00pm

Also noticed that this is a cluster with 1 zone and 1 node. But there are 7 nodes reported.

I tried to remove replicas by setting number_of_replicas to 0. But only one node is vacant with no data per ECE admin console. Other nodes all reported some data. Feeling like something messed up in previous attempts to increase the memory for this cluster.

jmtrc · May 25, 2018, 7:45pm

Under the same ECE adminstration, I noticed another more important 2x2 cluster reports 6 nodes plus one tiebreaker node. By looking at constructor logs, I see similar entries like:

[2018-05-25 12:47:03,587][INFO ][no.found.cluster.plan.elasticsearch.40215dec0d49463cac59ab45379c8478] Need to set exclusions and awareness attributes. Current values are [Set(instance-0000000028, instance-0000000027, 1527252403592)] and [Set()] {"cluster_id":"40215dec0d49463cac59ab45379c8478"}

[2018-05-25 12:47:07,799][INFO ][no.found.cluster.plan.elasticsearch.40215dec0d49463cac59ab45379c8478] Found bad verifications: [List((ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),instance-0000000028),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@1c5c72ae), (ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),instance-0000000039),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@10860212), (ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),instance-0000000037),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@5c4f6624), (ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),instance-0000000027),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@14e0d97c), (ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),instance-0000000038),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@6bd6004c), (ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),instance-0000000040),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@4b6009b6), (ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),tiebreaker-0000000036),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@6cb938b8))] {"cluster_id":"40215dec0d49463cac59ab45379c8478"}

This happened after an attempt to move nodes from one runner to another one using admin console.

Any suggestions to sort this out?

Thank you.

Jin.

system · June 8, 2018, 7:45pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch with > 40 nodes, missing shards and indexing troubles Elasticsearch	11	659	July 6, 2017
Cluster Health degraded overnight with no apparent reason Elasticsearch	5	1706	July 6, 2017
Shard timeout problem on AWS Elasticsearch	8	434	July 6, 2017
Cluster crash, symptoms and possible explanation Elasticsearch	20	2138	July 6, 2017
500: An internal server error Elastic Cloud Enterprise (ECE)	14	11873	November 14, 2017

Logging-and-metrics stuck

Related topics