Nodes not linking up in cluster

jwalls · January 14, 2019, 10:45pm

Hello again, I'm setting up a 3 node cluster across 3 separate IPs, however the status shows that none are connected to each other. Here's the

 curl -uelastic -XGET 'cec-es-master:9200/_cluster/health?pretty'

output:

  "cluster_name" : "i-love-nishiki",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 0,
  "active_primary_shards" : 0,
  "active_shards" : 0,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 384,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 0.0

cec-es-master elasticsearch.yml:

  cluster.name: i-love-nishiki
  node.name: ${HOSTNAME}
  path.data: /var/lib/elasticsearch
  path.logs: /var/log/elasticsearch
  network.host: cec-es-master
  discovery.zen.ping.unicast.hosts: ["cec-es-data-0", "cec-es-data-1"]
  node.master: true
  node.data: false

cec-es-data-0 elasticsearch.yml:

  cluster.name: i-love-nishiki
  node.name: ${HOSTNAME}
  path.data: /var/lib/elasticsearch
  path.logs: /var/log/elasticsearch
  network.host: cec-es-data-0
  node.master: false 
  node.data: true
  discovery.zen.ping.unicast.hosts: ["cec-es-master", "cec-es-data-1"]

cec-es-data-1 elasticsearch.yml:

  cluster.name: i-love-nishiki
  node.name: ${HOSTNAME}
  path.data: /var/lib/elasticsearch
  path.logs: /var/log/elasticsearch
  network.host: cec-es-data-1
  discovery.zen.ping.unicast.hosts: ["cec-es-master", "cec-es-data-0"]
  node.master: false
  node.data: true

warkolm · January 15, 2019, 1:28am

What do your logs show?

jwalls · January 15, 2019, 3:57pm

2019-01-15T00:00:16,213][WARN ][r.suppressed             ] [cec-es-master] path: /elastalert_status/elastalert_error, params: {index=elastalert_status, type=elastalert_error}
org.elasticsearch.action.UnavailableShardsException: [elastalert_status][1] primary shard is not active Timeout: [1m], request: [BulkShardRequest [[elastalert_status][1]] containing [index {[elastalert_status][elastalert_error][AWhQF6_yVkHSJqQNO0PB], source[{"message":"Error running query: TransportError(400, u'illegal_argument_exception', u'maxConcurrentShardRequests must be >= 1')","traceback":["Traceback (most recent call last):","  File \"/code/elastalert/elastalert.py\", line 390, in get_hits","    **extra_args","  File \"/usr/lib/python2.7/site-packages/elasticsearch/client/utils.py\", line 76, in _wrapped","    return func(*args, params=params, **kwargs)","  File \"/usr/lib/python2.7/site-packages/elasticsearch/client/__init__.py\", line 655, in search","    doc_type, '_search'), params=params, body=body)","  File \"/usr/lib/python2.7/site-packages/elasticsearch/transport.py\", line 314, in perform_request","    status, headers_response, data = connection.perform_request(method, url, params, body, headers=headers, ignore=ignore, timeout=timeout)","  File \"/usr/lib/python2.7/site-packages/elasticsearch/connection/http_requests.py\", line 90, in perform_request","    self._raise_error(response.status_code, raw_data)","  File \"/usr/lib/python2.7/site-packages/elasticsearch/connection/base.py\", line 125, in _raise_error","    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(status_code, error_message, additional_info)","RequestError: TransportError(400, u'illegal_argument_exception', u'maxConcurrentShardRequests must be >= 1')"],"data":{"query":{"sort":[{"@timestamp":{"order":"asc"}}],"query":{"bool":{"filter":{"bool":{"must":[{"range":{"@timestamp":{"gt":"2019-01-14T17:17:39.105796Z","lte":"2019-01-14T17:32:39.105796Z"}}},{"bool":{"must":[{"match_phrase":{"message":"Mongoose default connection disconnected"}},{"regexp":{"program":"pulse-refresh"}}]}}]}}}}},"rule":"Pulse Refresh DB Error Detector"},"@timestamp":"2019-01-15T05:59:16.208358Z"}]}]]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryBecauseUnavailable(TransportReplicationAction.java:932) [elasticsearch-6.5.4.jar:6.5.4]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.retryIfUnavailable(TransportReplicationAction.java:778) [elasticsearch-6.5.4.jar:6.5.4]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase.doRun(TransportReplicationAction.java:731) [elasticsearch-6.5.4.jar:6.5.4]
    at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) [elasticsearch-6.5.4.jar:6.5.4]
    at org.elasticsearch.action.support.replication.TransportReplicationAction$ReroutePhase$2.onTimeout(TransportReplicationAction.java:892) [elasticsearch-6.5.4.jar:6.5.4]
    at org.elasticsearch.cluster.ClusterStateObserver$ContextPreservingListener.onTimeout(ClusterStateObserver.java:317) [elasticsearch-6.5.4.jar:6.5.4]
    at org.elasticsearch.cluster.ClusterStateObserver$ObserverClusterStateListener.onTimeout(ClusterStateObserver.java:244) [elasticsearch-6.5.4.jar:6.5.4]
    at org.elasticsearch.cluster.service.ClusterApplierService$NotifyTimeout.run(ClusterApplierService.java:559) [elasticsearch-6.5.4.jar:6.5.4]
    at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:624) [elasticsearch-6.5.4.jar:6.5.4]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_111]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_111]
    at java.lang.Thread.run(Thread.java:745) [?:1.8.0_111]

this is a new cluster i'm trying to create after removing an old version of ES and Kibana as the entire ELK stack had been broken for quite some time and no one knew how it was broken or even how it worked in the first place

warkolm · January 15, 2019, 9:17pm

There should be more than that? Specifically there is a period after the node starts up where it will try to connect to other nodes, so seeing that part of the log would be very useful.

jwalls · January 15, 2019, 9:29pm

my log just has that snippet repeated over and over.

Interestingly enough, deleting the discovery line in both data node .yml files, but leaving it in the master node .yml is what fixed it. my cluster is up and all indices are green.

system · February 12, 2019, 9:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.