7.3.1 After upgrading to 7.6.2, the cluster master freezes irregularly

zcola · April 21, 2020, 6:43am

warkolm · April 21, 2020, 6:45am

Providing a link to a GitHub issue doesn't really clarify what you are expecting here sorry to say.

zcola · April 23, 2020, 3:01am

There were problems a few hours after the upgrade completed normal operation on the morning of April 9, the specific performance was

Replica shards cannot be allocated, the newly created index has no data, master and logstash both appear failed to process cluster event (put-mapping) within 30s related logs
Trigger rollover stuck
Pending task has more than a dozen tasks, and will never disappear, normally empty
Delete index is normal, all nodes have joined the cluster, and node load is normal

Restart the elected master After re-election, triggering the re-election of the master is normal, but on average it will freeze once every few hours.

Within six months before the upgrade we have not had this problem, there is not much change in the business, 3 dedicated master node, hot node shard number 50 +, stale node shard number 200+. freeze 300+ shards, a total of 62 nodes

Provide logs (if relevant):
The master log only appears when there is a problem

org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s, at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:143) [elasticsearch-7.6.2.jar:7.6.2], at java.util.ArrayList.forEach(ArrayList.java:1507) [?:?], at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:142) [elasticsearch-7.6.2.jar:7.6.2], at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) [elasticsearch-7.6.2.jar:7.6.2], at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?], at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?], at java.lang.Thread.run(Thread.java:830) [?:?]


o.e.c.r.a.DiskThresholdMonitor skipping monitor as a check is already in progress

explaining the allocation for [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false], found shard [[push_up_new_log-2020.04.10-000049][1], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=INDEX_CREATED], at[2020-04-10T03:06:20.497Z], delayed=false, allocation_status[no_attempt]]]

Execute explain api

{
  "index": "push_up_new_log-2020.04.10-000049",
  "shard": 1,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "INDEX_CREATED",
    "at": "2020-04-10T03:06:20.497Z",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "yes",
  "allocate_explanation": "can allocate the shard",
  "target_node": {
    "id": "JLxrX6zoStKFzE6lFsi76w",
    "name": "data-51-hot",
    "transport_address": "10.90.141.133:9300",
    "attributes": {
      "zone": "hot",
      "xpack.installed": "true"
    }
  },

Execute cluster / pending_task, kibana_index_template can be ignored, there is always, I do n’t know where 6.x kibana connects this es

{
  "tasks": [
    {
      "insert_order": 34926,
      "priority": "URGENT",
      "source": "create-index-template [kibana_index_template:.kibana], cause [api]",
      "executing": true,
      "time_in_queue_millis": 489,
      "time_in_queue": "489ms"
    },
    {
      "insert_order": 34927,
      "priority": "URGENT",
      "source": "create-index-template [kibana_index_template:.kibana], cause [api]",
      "executing": false,
      "time_in_queue_millis": 82,
      "time_in_queue": "82ms"
    },
    {
      "insert_order": 34928,
      "priority": "URGENT",
      "source": "create-index-template [kibana_index_template:.kibana], cause [api]",
      "executing": false,
      "time_in_queue_millis": 37,
      "time_in_queue": "37ms"
    },
    {
      "insert_order": 34883,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13545,
      "time_in_queue": "13.5s"
    },
    {
      "insert_order": 34882,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13546,
      "time_in_queue": "13.5s"
    },
    {
      "insert_order": 34888,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13494,
      "time_in_queue": "13.4s"
    },
    {
      "insert_order": 34884,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13535,
      "time_in_queue": "13.5s"
    },
    {
      "insert_order": 34887,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13494,
      "time_in_queue": "13.4s"
    },
    {
      "insert_order": 34897,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11554,
      "time_in_queue": "11.5s"
    },
    {
      "insert_order": 34886,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13523,
      "time_in_queue": "13.5s"
    },
    {
      "insert_order": 34889,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13487,
      "time_in_queue": "13.4s"
    },
    {
      "insert_order": 34896,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11554,
      "time_in_queue": "11.5s"
    },
    {
      "insert_order": 34885,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13531,
      "time_in_queue": "13.5s"
    },
    {
      "insert_order": 34902,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11295,
      "time_in_queue": "11.2s"
    },
    {
      "insert_order": 34895,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11792,
      "time_in_queue": "11.7s"
    },
    {
      "insert_order": 34898,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11460,
      "time_in_queue": "11.4s"
    },
    {
      "insert_order": 34901,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11295,
      "time_in_queue": "11.2s"
    },
    {
      "insert_order": 34899,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11458,
      "time_in_queue": "11.4s"
    },
    {
      "insert_order": 31866,
      "priority": "NORMAL",
      "source": "cluster_reroute(reroute after starting shards)",
      "executing": false,
      "time_in_queue_millis": 1212977,
      "time_in_queue": "20.2m"
    }
  ]
}

尝试降级提示
java.lang.IllegalStateException: cannot downgrade a node from version [7.6.2] to version [7.3.1]",

system · May 21, 2020, 3:01am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help! After upgrading to Elasticsearch 6 cluster shard replicas will not allocate Elasticsearch	9	5947	December 25, 2017
New Elasticsearch 7.6.0 cluster eventually becomes unresponsive Elasticsearch	3	383	April 13, 2020
The elasticsearch cluster often turns red,marking and sending shard failed due to [failed to create shard] Elasticsearch	9	249	August 13, 2024
Replicas won't allocate after master change (0.20.6) Elasticsearch	4	388	July 6, 2017
Upgrades causing Elastic Search downtime Elasticsearch	9	491	July 6, 2017

7.3.1 After upgrading to 7.6.2, the cluster master freezes irregularly

Related topics