Shards Not Being Allocated To Nodes

vilakazil · October 13, 2021, 7:59am

Hope to get assistance here as I have been struggling.

I have 11 nodes that run Elasticsearch, with 2 master node. 5 shards and 1 replica (default)

We recently ran patches on the nodes and upgraded the .Net software, there after only 2 shards are now allocated to 2 nodes and the other 9 do not have any shards allocated to them. I have ran the _cluster/reroute?retry_failed=true using Postman but this has not helped.

We see the below statuses.
allocation status: no valid shard copy
allocation status: no attempt
allocate explanation : cannot allocate because all found copies of the shard are either stale or corrupt

The above is causing my Indexer IIS application not to write any new documents to the data directories.

I am running version 5.6 of Elasticsearch.

warkolm · October 13, 2021, 8:01am

Welcome to our community!

5.X is extremely old and well past, EOL. Please upgrade as a matter of urgency!

What is the output from the _cluster/stats?pretty&human API?
What do your Elasticsearch logs show?

vilakazil · October 13, 2021, 9:03am

Thanks Mark.

Output:

    "_nodes": {
        "total": 11,
        "successful": 11,
        "failed": 0
    },
    "cluster_name": "elasticsearch",
    "timestamp": 1634114914482,
    "status": "red",
    "indices": {
        "count": 1,
        "shards": {
            "total": 2,
            "primaries": 1,
            "replication": 1.0,
            "index": {
                "shards": {
                    "min": 2,
                    "max": 2,
                    "avg": 2.0
                },
                "primaries": {
                    "min": 1,
                    "max": 1,
                    "avg": 1.0
                },
                "replication": {
                    "min": 1.0,
                    "max": 1.0,
                    "avg": 1.0
                }
            }
        },
        "docs": {
            "count": 300,
            "deleted": 140
        },
        "store": {
            "size": "27.6gb",
            "size_in_bytes": 29647417194,
            "throttle_time": "0s",
            "throttle_time_in_millis": 0
        },
        "fielddata": {
            "memory_size": "0b",
            "memory_size_in_bytes": 0,
            "evictions": 0
        },
        "query_cache": {
            "memory_size": "67.2kb",
            "memory_size_in_bytes": 68904,
            "total_count": 4765190,
            "hit_count": 252225,
            "miss_count": 4512965,
            "cache_size": 0,
            "cache_count": 2885,
            "evictions": 2885
        },
        "completion": {
            "size": "0b",
            "size_in_bytes": 0
        },
        "segments": {
            "count": 5,
            "memory": "65.8kb",
            "memory_in_bytes": 67440,
            "terms_memory": "21.5kb",
            "terms_memory_in_bytes": 22087,
            "stored_fields_memory": "1.7kb",
            "stored_fields_memory_in_bytes": 1776,
            "term_vectors_memory": "1.5kb",
            "term_vectors_memory_in_bytes": 1632,
            "norms_memory": "1.2kb",
            "norms_memory_in_bytes": 1280,
            "points_memory": "197b",
            "points_memory_in_bytes": 197,
            "doc_values_memory": "39.5kb",
            "doc_values_memory_in_bytes": 40468,
            "index_writer_memory": "0b",
            "index_writer_memory_in_bytes": 0,
            "version_map_memory": "0b",
            "version_map_memory_in_bytes": 0,
            "fixed_bit_set": "352b",
            "fixed_bit_set_memory_in_bytes": 352,
            "max_unsafe_auto_id_timestamp": -1,
            "file_sizes": {}
        }
    },
    "nodes": {
        "count": {
            "total": 11,
            "data": 11,
            "coordinating_only": 0,
            "master": 2,
            "ingest": 0
        },
        "versions": [
            "5.6.16"
        ],
        "os": {
            "available_processors": 164,
            "allocated_processors": 164,
            "names": [
                {
                    "name": "Windows Server 2012 R2",
                    "count": 11
                }
            ],
            "mem": {
                "total": "139.9gb",
                "total_in_bytes": 150317256704,
                "free": "32.3gb",
                "free_in_bytes": 34764726272,
                "used": "107.6gb",
                "used_in_bytes": 115552530432,
                "free_percent": 23,
                "used_percent": 77
            }
        },
        "process": {
            "cpu": {
                "percent": 2
            },
            "open_file_descriptors": {
                "min": -1,
                "max": -1,
                "avg": 0
            }
        },
        "jvm": {
            "max_uptime": "1.8d",
            "max_uptime_in_millis": 159343516,
            "versions": [
                {
                    "version": "1.8.0_301",
                    "vm_name": "Java HotSpot(TM) 64-Bit Server VM",
                    "vm_version": "25.301-b09",
                    "vm_vendor": "Oracle Corporation",
                    "count": 11
                }
            ],
            "mem": {
                "heap_used": "15.4gb",
                "heap_used_in_bytes": 16632632416,
                "heap_max": "62.9gb",
                "heap_max_in_bytes": 67550838784
            },
            "threads": 1575
        },
        "fs": {
            "total": "4.2tb",
            "total_in_bytes": 4724419940352,
            "free": "3.5tb",
            "free_in_bytes": 3908139868160,
            "available": "3.5tb",
            "available_in_bytes": 3908139868160
        },
        "plugins": [],
        "network_types": {
            "transport_types": {
                "netty4": 11
            },
            "http_types": {
                "netty4": 11
            }
        }
    }
}


**The ES logs showed the below yesterday, where as today they say the master nodes are being detected.**

2021-10-12T21:58:50,300][WARN ][r.suppressed             ] path: /default_sm_index_%2A/interactiondata/_search, params: {index=default_sm_index_*, type=interactiondata}
org.elasticsearch.action.search.SearchPhaseExecutionException: all shards failed


**My index logs show the below when making a bulk update:**

```2021-10-13 10:51:26.686 [WRN] Tenant "default": failed to update index.
2021-10-13 10:52:39.283 [WRN] Tenant "default": Commit Bulk failed.
System.Exception: Invalid NEST response built from a unsuccessful low level call on POST: /_bulk
# Invalid Bulk items:
# Audit trail of this API call:
 - [1] BadRequest: Node: http://10.102.246.125:9200/ Took: 00:01:00.0833281
 - [2] MaxTimeoutReached: Took: -738075.08:52:39.2826415
# OriginalException: System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
 ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.```

vilakazil · October 13, 2021, 9:15am

Please also note that I have 1,280,870 documents waiting to be indexed.

warkolm · October 13, 2021, 8:36pm

The first suggestion would be to use a newer version of Elasticsearch.

vilakazil · October 14, 2021, 6:33am

Thanks Mark. Will test the compatibility between my indexer and ES 7.9 in the dev environment first.

Is the a workaround for this before upgrading as this is a Production issue.

Christian_Dahlqvist · October 14, 2021, 6:54am

As Mark pointed out this is very, very old and you should look to upgrade.

Having 2 master eligible nodes is very bad. You should always look to have 3 master eligible nodes in a cluster as Elasticsearch relies on consensus to be able to elect a master. As you are running such an old version of Elasticsearch you must also make sure you have discovery.zen.minimum_master_nodes defined and set to 2 in your node config. This will prevent your cluster from suffering from split-brain scenarios and the data loss this can cause. If you do not currently have this set correctly it is possible that your data has been lost.

You should also always make sure you backup your data using the snapshot and restore API to ensure you do not lose your data if you suffer a catastrophic failure.

system · November 11, 2021, 6:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shards are not allocating to available nodes Elasticsearch	1	8456	April 26, 2017
Diagnosing why one node is getting no shards Elasticsearch	3	953	July 6, 2017
Help! After upgrading to Elasticsearch 6 cluster shard replicas will not allocate Elasticsearch	9	5944	December 25, 2017
Shards are not allocating to available node Elasticsearch	2	1283	October 19, 2018
Shard allocation is not happening as expected after adding two more nodes to our ES cluster Elasticsearch	10	2670	May 14, 2019

Shards Not Being Allocated To Nodes

Related topics