Shards failed in Network screen

gnordli · October 21, 2020, 6:02am

I am using the Elastic Endpoint agents with 7.9.2.

Elasticsearch is running in a single node cluster

I am getting an error when in the network screen saying 5 of 7 shards have failed.

curl -u elastic -X GET "127.0.0.1:9200/_cluster/health"?pretty=true

   {
      "cluster_name" : "elasticsearch",
      "status" : "yellow",
      "timed_out" : false,
      "number_of_nodes" : 1,
      "number_of_data_nodes" : 1,
      "active_primary_shards" : 287,
      "active_shards" : 287,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 20,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 93.48534201954396

One of the indices is failing. I am not sure how to fix it.

curl -u elastic -X GET "127.0.0.1:9200/_cluster/allocation/explain"?pretty=true

   {
      "index" : ".ds-logs-endpoint.events.network-default-000001",
      "shard" : 0,
      "primary" : false,
      "current_state" : "unassigned",
      "unassigned_info" : {
        "reason" : "CLUSTER_RECOVERED",
        "at" : "2020-10-20T19:32:30.113Z",
        "last_allocation_status" : "no_attempt"
      },
      "can_allocate" : "no",
      "allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes",
      "node_allocation_decisions" : [
        {
          "node_id" : "Q2WsarzlQMOXy-Ptu5RF2A",
          "node_name" : "ml-monitor2",
          "transport_address" : "127.0.0.1:9300",
          "node_attributes" : {
            "ml.machine_memory" : "8349188096",
            "xpack.installed" : "true",
            "transform.node" : "true",
            "ml.max_open_jobs" : "20"
          },
          "node_decision" : "no",
          "deciders" : [
            {
              "decider" : "same_shard",
              "decision" : "NO",
              "explanation" : "a copy of this shard is already allocated to this node [[.ds-logs-endpoint.events.network-default-000001][0], node[Q2WsarzlQMOXy-Ptu5RF2A], [P], s[STARTED], a[id=FH2eDC0-QUWUtUpEwt2Uyw]]"
            }
          ]
        }
      ]
    }

I have lots of duplicate shards, this is a small subset. I am not sure how to clean them up. One is started, the other is not assigned.

curl -u elastic -X GET "127.0.0.1:9200/_cat/shards"?pretty=true

.ds-logs-endpoint.events.network-default-000001   0 p STARTED    50943  19.7mb 127.0.0.1 ml-monitor2
.ds-logs-endpoint.events.network-default-000001   0 r UNASSIGNED                           
.ds-metrics-system.process_summary-default-000001 0 p STARTED    10628   2.1mb 127.0.0.1 ml-monitor2
.ds-metrics-system.process_summary-default-000001 0 r UNASSIGNED                           
.ds-metrics-system.cpu-default-000001             0 p STARTED    10628   3.1mb 127.0.0.1 ml-monitor2
.ds-metrics-system.cpu-default-000001             0 r UNASSIGNED                           
.siem-signals-default-000001                      0 p STARTED        0    208b 127.0.0.1 ml-monitor2
.siem-signals-default-000001                      0 r UNASSIGNED

Any thoughts?
thanks,
Geoff

Frank_Hassanabad · October 21, 2020, 2:47pm

Does this thread help?

I think you have only one node which is why it's yellow to start with. If you only want one node I think the advice on there should fix you up here.

gnordli · October 21, 2020, 3:51pm

Yes, I see it now.

I was able to change all of those indices to number of replicas = 0

PUT /*/_settings
{
  "index" : {
    "number_of_replicas" : 0
  }
}

I am going to be primarily using 1 node clusters in my deployments. Is there a way to set the replicas to 0 during the setup?

thanks!!

Frank_Hassanabad · October 21, 2020, 5:01pm

Most things such as beats uses index templates so you could change it within your index templates and when rollovers happen you should be ok (caveat I haven't done this before though personally):

gnordli · October 21, 2020, 6:06pm

It seems the failed replication shards was just a red herring. The cluster is now green.

I am now getting a

 "reason": "failed to find geo_point field [destination.geo.location]",

Here are more of the logs.

{
  "took": 20,
  "timed_out": false,
  "_shards": {
    "total": 7,
    "successful": 2,
    "skipped": 1,
    "failed": 5,
    "failures": [
      {
        "shard": 0,
        "index": ".ds-logs-endpoint.events.file-default-000001",
        "node": "Q2WsarzlQMOXy-Ptu5RF2A",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to find geo_point field [destination.geo.location]",
          "index_uuid": "K8sqCAHURQOsteiDqh_k-w",
          "index": ".ds-logs-endpoint.events.file-default-000001"
        }
      },
      {
        "shard": 0,
        "index": ".ds-logs-endpoint.events.library-default-000001",
        "node": "Q2WsarzlQMOXy-Ptu5RF2A",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to find geo_point field [destination.geo.location]",
          "index_uuid": "SW-keiFgSvKpxBnONpZQwA",
          "index": ".ds-logs-endpoint.events.library-default-000001"
        }
      },
      {
        "shard": 0,
        "index": ".ds-logs-endpoint.events.process-default-000001",
        "node": "Q2WsarzlQMOXy-Ptu5RF2A",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to find geo_point field [destination.geo.location]",
          "index_uuid": "iM2dtA2XQTuqnmNDCCOrEQ",
          "index": ".ds-logs-endpoint.events.process-default-000001"
        }
      },
      {
        "shard": 0,
        "index": ".ds-logs-endpoint.events.registry-default-000001",
        "node": "Q2WsarzlQMOXy-Ptu5RF2A",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to find geo_point field [destination.geo.location]",
          "index_uuid": "-d1EaEbpQQihrvMdDEmHaw",
          "index": ".ds-logs-endpoint.events.registry-default-000001"
        }
      },
      {
        "shard": 0,
        "index": ".ds-logs-endpoint.events.security-default-000001",
        "node": "Q2WsarzlQMOXy-Ptu5RF2A",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to find geo_point field [destination.geo.location]",
          "index_uuid": "VPBqTG0mQEykv9uF5dTiLQ",
          "index": ".ds-logs-endpoint.events.security-default-000001"
        }
      }
    ]
  },
  "hits": {

When I go into the index management in Kibana. I don't see any indexes that start with .ds. I don't see anything in the data streams area either.

I am not sure where to go from here...

thanks,
Geoff

Frank_Hassanabad · October 22, 2020, 12:29am

@gnordli,

My good colleague @spong pointed me to this ticket and I think at this point you are seeing the same thing:

It looks like there are some workarounds and explanations in there along with fixes from 6 days ago that will end up in upcoming 7.10.0 release.

gnordli · October 22, 2020, 6:54pm

@Frank_Hassanabad

Yes, that looks like it. I guess I will just ignore it for now as it is fixed in 7.10.

thanks!!

system · November 19, 2020, 6:55pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticserach6.1.1 restart and i got "all shards failed" Elasticsearch	4	51570	February 23, 2018
ElasticSearch Service: 2 of 15 shards failed Elasticsearch	3	431	November 12, 2019
X of x shards failed Elasticsearch	6	1179	July 17, 2019
Getting Error "all shards failed" when I go for re-indexing Elasticsearch	8	2019	September 9, 2020
Frequent shard failures Elasticsearch	7	663	July 20, 2023

Shards failed in Network screen

Related topics