Logging-and-metrics stuck


(Jin) #1

The cluster looks healthy:
{
"cluster_name" : "05274b09effb4f9f9d237d0f92204ce9",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 7,
"number_of_data_nodes" : 7,
"active_primary_shards" : 63,
"active_shards" : 70,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

But when I was trying to increase size from 1GB to 2GB, it seems stuck with messages like these:

[2018-05-24 16:04:43,380][INFO ][no.found.cluster.plan.elasticsearch.05274b09effb4f9f9d237d0f92204ce9] Timeout waiting for cluster states to update, waiting a little before rechecking... {"cluster_id":"05274b09effb4f9f9d237d0f92204ce9"}
[2018-05-24 16:04:53,497][INFO ][no.found.cluster.plan.elasticsearch.05274b09effb4f9f9d237d0f92204ce9] Setting settings: [HttpRequest(PUT,/_cluster/settings,List(),HttpEntity(application/json; charset=UTF-8,{"transient":{"cluster":{"routing":{"allocation":{"exclude":{"_name":"instance-0000000001,1527177893494"},"awareness":{"attributes":""}}}}}}),HTTP/1.1)] {"cluster_id":"05274b09effb4f9f9d237d0f92204ce9"}
[2018-05-24 16:04:53,655][INFO ][no.found.cluster.plan.elasticsearch.05274b09effb4f9f9d237d0f92204ce9] Found bad verifications: [List((ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000000),Missing(ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000000)),no.found.curator.pimps.FutureWatchedEvent@6df3d48e))] {"cluster_id":"05274b09effb4f9f9d237d0f92204ce9"}
[2018-05-24 16:04:53,655][INFO ][no.found.cluster.plan.elasticsearch.05274b09effb4f9f9d237d0f92204ce9] Verified migration with result: [List((ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000006),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@1e62aa77), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000007),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@1b323d68), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000004),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@75722552), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000005),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@826be57), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000002),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@31c8c36), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000003),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@94f3840), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000000),Missing(ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000000)),no.found.curator.pimps.FutureWatchedEvent@6df3d48e), (ElasticsearchInstance(ElasticsearchCluster(05274b09effb4f9f9d237d0f92204ce9),instance-0000000001),ClusterStateVerified(true,QEw_XswtQxaT4y22ulhZrw,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@509a1b61))] {"cluster_id":"05274b09effb4f9f9d237d0f92204ce9"}
[2018-05-24 16:04:53,494][INFO ][no.found.cluster.plan.elasticsearch.05274b09effb4f9f9d237d0f92204ce9] Need to set exclusions and awareness attributes. Current values are [Set(instance-0000000001, 1527177873195)] and [Set()] {"cluster_id":"05274b09effb4f9f9d237d0f92204ce9"}

I was watching /_cat/shards for anything not STARTED but all shards were reported STARTED. Does this mean anything?

Thanks.

Jin.


(Jin) #2

Also noticed that this is a cluster with 1 zone and 1 node. But there are 7 nodes reported.

I tried to remove replicas by setting number_of_replicas to 0. But only one node is vacant with no data per ECE admin console. Other nodes all reported some data. Feeling like something messed up in previous attempts to increase the memory for this cluster.


(Jin) #3

Under the same ECE adminstration, I noticed another more important 2x2 cluster reports 6 nodes plus one tiebreaker node. By looking at constructor logs, I see similar entries like:

[2018-05-25 12:47:03,587][INFO ][no.found.cluster.plan.elasticsearch.40215dec0d49463cac59ab45379c8478] Need to set exclusions and awareness attributes. Current values are [Set(instance-0000000028, instance-0000000027, 1527252403592)] and [Set()] {"cluster_id":"40215dec0d49463cac59ab45379c8478"}

[2018-05-25 12:47:07,799][INFO ][no.found.cluster.plan.elasticsearch.40215dec0d49463cac59ab45379c8478] Found bad verifications: [List((ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),instance-0000000028),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@1c5c72ae), (ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),instance-0000000039),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@10860212), (ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),instance-0000000037),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@5c4f6624), (ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),instance-0000000027),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@14e0d97c), (ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),instance-0000000038),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@6bd6004c), (ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),instance-0000000040),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@4b6009b6), (ElasticsearchInstance(ElasticsearchCluster(40215dec0d49463cac59ab45379c8478),tiebreaker-0000000036),ClusterStateVerified(false,9-rCAmBSRoKuo1pAJ-bt2g,Set(STARTED)),no.found.curator.pimps.FutureWatchedEvent@6cb938b8))] {"cluster_id":"40215dec0d49463cac59ab45379c8478"}

This happened after an attempt to move nodes from one runner to another one using admin console.

Any suggestions to sort this out?

Thank you.

Jin.


(system) #4

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.