Restarted one node - Kibana reports all data nodes as down

epacke · December 20, 2019, 7:16pm

Hi there
I am running a re-creational cluster at home for firewall logs and some nginx traffic logs. Recently I added two nodes to my setup and now there's 2 data nodes and one master non-data node.

Today I tried to restart the elasticsearch service on one of the data nodes and suddenly Kibana reported both data nodes as down. Only the master was reported as up.

/_cat/shards shows a bunch of unassigned shards after the reallocation stops. Restarting the node again resulted in even more. From the looks of it all primary shards has been allocated, but the replicas has not.

Checking one of the indexes with unallocated shards shows that replication is enabled:

{
  "fortigate-2019.11.19" : {
    "settings" : {
      "index" : {
        "creation_date" : "1574121601277",
        "number_of_shards" : "5",
        "number_of_replicas" : "1",
        "uuid" : "_lUL_6mjQYKGtP_LJ5K3ww",
        "version" : {
          "created" : "6080299",
          "upgraded" : "7050099"
        },
        "provided_name" : "fortigate-2019.11.19"
      }
    }
  }
}

All nodes seems to be detected from each of the members:

{
  "cluster_name" : "siem",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 2,
  "active_primary_shards" : 2182,
  "active_shards" : 3398,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 965,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 77.88219115287646
}

Cluster config (grep -e '^[^#]' /etc/elasticsearch/elasticsearch.yml)

cluster.name: siem
node.name: siem-1
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 192.168.70.150
discovery.seed_hosts:
    - 192.168.70.150
    - 192.168.70.161
cluster.max_shards_per_node: 4000

cluster.name: siem
node.name: siem-2
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 192.168.70.161
discovery.seed_hosts:
    - 192.168.70.150
    - 192.168.70.161
cluster.max_shards_per_node: 4000

cluster.name: siem
node.name: siem-master
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 192.168.70.162
discovery.seed_hosts:
    - 192.168.70.150
    - 192.168.70.161
    - 192.168.70.162
node.master: true
node.voting_only: false
node.data: false
node.ingest: false
node.ml: false
xpack.ml.enabled: true
cluster.remote.connect: false

Cluster log shows a bunch of these messages:
Caused by: org.elasticsearch.action.UnavailableShardsException: [.monitoring-es-7-2019.12.20][0] primary shard is not active Timeout: [1m]

I can probably fix this by running on of the allocation scripts, but I'd rather understand why this happened if anyone would be up for explaining.

Kind regards,
Patrik

epacke · December 20, 2019, 7:37pm

Ok, I feel silly. I have the elastic apt repo configured on all nodes and an upgrade took place on one of them. This is why the replication did not work.

If anyone bumps into this I came to the conclusion and "solved" it like this:

Reason for unallocated shards:

curl -s http://192.168.70.150:9200/_cluster/allocation/explain?pretty

This actually said that the node could not allocated shards to the peer due to a difference in version.

This command also gave some information:
curl -s http://192.168.70.161:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED

In the end I stopped the process on each of the nodes, then started it one by one. Looks like it is recovering now. Fingers crossed.

Kind regards,
Patrik

system · January 17, 2020, 7:37pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
When I shutdown one of my ES Data Nodes shards don't go to the other DN Elasticsearch	4	522	April 9, 2017
All Shards Unassigned due to Data Node Restarts Elasticsearch	3	1593	May 28, 2019
Shards unassigned after elasticsearch restart Elasticsearch	16	3357	September 5, 2017
Data node lost, all shards go to RED - Data node returns but shards lost forever Elasticsearch	10	3606	July 24, 2019
Unassigned Shards after Master node restart Elasticsearch	4	2371	March 24, 2022

Restarted one node - Kibana reports all data nodes as down

Related topics