Node Leaving Cluster on Reindex

voipoclay · December 7, 2016, 5:29pm

Hello all!

So I have been having this issue lately with elasticseach 2.4.

When I run

POST /_reindex
{
  "source": {
    "index": "indextest-2016.12.04"
  },
  "dest": {
    "index": "indextest-2016.12.04-2"
  }
}

I see random nodes leaving the cluster momentarily then joining back up. If I do requests_per_second=50 then that seems like a kind of sweet spot where it barely happens.

Here is a response once its finished (the failure nodes change)

{
   "took": 182120,
   "timed_out": false,
   "total": 2999257,
   "updated": 7200,
   "created": 600,
   "batches": 9,
   "version_conflicts": 0,
   "noops": 0,
   "retries": 0,
   "throttled_millis": 0,
   "requests_per_second": "unlimited",
   "throttled_until_millis": 0,
   "failures": [
      {
         "shard": -1,
         "index": null,
         "reason": {
            "type": "node_not_connected_exception",
            "reason": "[NODE_2][IP:9300] Node not connected"
         }
      },
      {
         "shard": -1,
         "index": null,
         "reason": {
            "type": "node_not_connected_exception",
            "reason": "[NODE_2][IP:9300] Node not connected"
         }
      },
      {
         "shard": -1,
         "index": null,
         "reason": {
            "type": "node_not_connected_exception",
            "reason": "[NODE_6][IP:9300] Node not connected"
         }
      }
   ]
}

I have never noticed this before and I am wondering how can I best go about seeing why a node is leaving and joining? In the master node logs and the data node I notice leaving there is no logs. Very weird!

Was hoping for insight as to what stats or logs to look for.

Thanks

nik9000 · December 8, 2016, 2:40pm

I've never seen it before either! Reindex can't abide the node that it is pulling data from leaving the cluster....

I wonder if the node that it is pulling data from is close to the edge performance wise? Or your documents are really really big? You might try setting the batch size lower if you have huge documents:

POST _reindex
{
  "source": {
    "index": "source",
    "size": 10  <------ Here. The default is 1000.
  },
  "dest": {
    "index": "dest",
  }
}

voipoclay · December 8, 2016, 6:36pm

Yeah its kinda weird!

It is in production so as you said it might just be hitting some sort of performance ceiling.

They are pretty decent performance wise though so that is a little odd.

I will try the batch size on source and see what happens, thanks!

I included some stats on the index as well.

      "primaries": {
         "docs": {
            "count": 2999257,
            "deleted": 0
         },
         "store": {
            "size_in_bytes": 2722495408,
            "throttle_time_in_millis": 0
         },

nik9000 · December 8, 2016, 7:20pm

Averaging 907 bytes per document doesn't look big.

voipoclay · December 8, 2016, 8:07pm

Yeah... I tried with size: 10 and same issue. A node almost immediately disconnects from cluster.

Load/cpu are like nothing. I am thinking maybe its LAN network saturation or throttling of sorts.

Other thought it maybe the index has some bad spots on disk. I will try with different/smaller indices.

Thanks anyway!

system · January 5, 2017, 8:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nodes leave and then rejoin the cluster randomly Elasticsearch	1	704	June 13, 2019
Nodes randomly, temporarily, leaving 7.3.2 cluster Elasticsearch	17	4807	May 1, 2020
Nodes continuously leaving and rejoining the cluster in 7.1 cluster after master switch Elasticsearch	8	1992	October 15, 2020
Random node frequently removed from the cluster Elasticsearch	12	1470	November 12, 2021
Node is disconnected from cluster and does not join existing cluster (ES 7.16.2) Elasticsearch	2	678	January 1, 2023

Node Leaving Cluster on Reindex

Related topics