7.3.1 After upgrading to 7.6.2, the cluster master freezes irregularly

Providing a link to a GitHub issue doesn't really clarify what you are expecting here sorry to say.

There were problems a few hours after the upgrade completed normal operation on the morning of April 9, the specific performance was

  • Replica shards cannot be allocated, the newly created index has no data, master and logstash both appear failed to process cluster event (put-mapping) within 30s related logs
  • Trigger rollover stuck
  • Pending task has more than a dozen tasks, and will never disappear, normally empty
  • Delete index is normal, all nodes have joined the cluster, and node load is normal

Restart the elected master After re-election, triggering the re-election of the master is normal, but on average it will freeze once every few hours.

Within six months before the upgrade we have not had this problem, there is not much change in the business, 3 dedicated master node, hot node shard number 50 +, stale node shard number 200+. freeze 300+ shards, a total of 62 nodes

Provide logs (if relevant):
The master log only appears when there is a problem

org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s, at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:143) [elasticsearch-7.6.2.jar:7.6.2], at java.util.ArrayList.forEach(ArrayList.java:1507) [?:?], at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:142) [elasticsearch-7.6.2.jar:7.6.2], at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) [elasticsearch-7.6.2.jar:7.6.2], at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?], at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?], at java.lang.Thread.run(Thread.java:830) [?:?]


o.e.c.r.a.DiskThresholdMonitor skipping monitor as a check is already in progress

explaining the allocation for [ClusterAllocationExplainRequest[useAnyUnassignedShard=true,includeYesDecisions?=false], found shard [[push_up_new_log-2020.04.10-000049][1], node[null], [R], recovery_source[peer recovery], s[UNASSIGNED], unassigned_info[[reason=INDEX_CREATED], at[2020-04-10T03:06:20.497Z], delayed=false, allocation_status[no_attempt]]]

Execute explain api

{
  "index": "push_up_new_log-2020.04.10-000049",
  "shard": 1,
  "primary": false,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "INDEX_CREATED",
    "at": "2020-04-10T03:06:20.497Z",
    "last_allocation_status": "no_attempt"
  },
  "can_allocate": "yes",
  "allocate_explanation": "can allocate the shard",
  "target_node": {
    "id": "JLxrX6zoStKFzE6lFsi76w",
    "name": "data-51-hot",
    "transport_address": "10.90.141.133:9300",
    "attributes": {
      "zone": "hot",
      "xpack.installed": "true"
    }
  },

Execute cluster / pending_task, kibana_index_template can be ignored, there is always, I do n’t know where 6.x kibana connects this es

{
  "tasks": [
    {
      "insert_order": 34926,
      "priority": "URGENT",
      "source": "create-index-template [kibana_index_template:.kibana], cause [api]",
      "executing": true,
      "time_in_queue_millis": 489,
      "time_in_queue": "489ms"
    },
    {
      "insert_order": 34927,
      "priority": "URGENT",
      "source": "create-index-template [kibana_index_template:.kibana], cause [api]",
      "executing": false,
      "time_in_queue_millis": 82,
      "time_in_queue": "82ms"
    },
    {
      "insert_order": 34928,
      "priority": "URGENT",
      "source": "create-index-template [kibana_index_template:.kibana], cause [api]",
      "executing": false,
      "time_in_queue_millis": 37,
      "time_in_queue": "37ms"
    },
    {
      "insert_order": 34883,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13545,
      "time_in_queue": "13.5s"
    },
    {
      "insert_order": 34882,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13546,
      "time_in_queue": "13.5s"
    },
    {
      "insert_order": 34888,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13494,
      "time_in_queue": "13.4s"
    },
    {
      "insert_order": 34884,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13535,
      "time_in_queue": "13.5s"
    },
    {
      "insert_order": 34887,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13494,
      "time_in_queue": "13.4s"
    },
    {
      "insert_order": 34897,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11554,
      "time_in_queue": "11.5s"
    },
    {
      "insert_order": 34886,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13523,
      "time_in_queue": "13.5s"
    },
    {
      "insert_order": 34889,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13487,
      "time_in_queue": "13.4s"
    },
    {
      "insert_order": 34896,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11554,
      "time_in_queue": "11.5s"
    },
    {
      "insert_order": 34885,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 13531,
      "time_in_queue": "13.5s"
    },
    {
      "insert_order": 34902,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11295,
      "time_in_queue": "11.2s"
    },
    {
      "insert_order": 34895,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11792,
      "time_in_queue": "11.7s"
    },
    {
      "insert_order": 34898,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11460,
      "time_in_queue": "11.4s"
    },
    {
      "insert_order": 34901,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11295,
      "time_in_queue": "11.2s"
    },
    {
      "insert_order": 34899,
      "priority": "HIGH",
      "source": "put-mapping",
      "executing": false,
      "time_in_queue_millis": 11458,
      "time_in_queue": "11.4s"
    },
    {
      "insert_order": 31866,
      "priority": "NORMAL",
      "source": "cluster_reroute(reroute after starting shards)",
      "executing": false,
      "time_in_queue_millis": 1212977,
      "time_in_queue": "20.2m"
    }
  ]
}

尝试降级提示
java.lang.IllegalStateException: cannot downgrade a node from version [7.6.2] to version [7.3.1]",

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.