Hi
I'm fairly new to Elasticsearch and I'm trying to maintain a small cluster.
Currently I'm having trouble with a growing number of pending tasks. In all other threads that I have looked at the issue have been caused by having a large amount of shards.. However I'm quite sure that is not the case here.. Here is the output from /_cluster/health
{
"status": "yellow",
"number_of_nodes": 3,
"unassigned_shards": 3,
"number_of_pending_tasks": 2254765,
"number_of_in_flight_fetch": 0,
"timed_out": false,
"active_primary_shards": 218,
"task_max_waiting_in_queue_millis": 43353576,
"relocating_shards": 0,
"active_shards_percent_as_number": 98.66071428571429,
"active_shards": 221,
"initializing_shards": 0,
"number_of_data_nodes": 2,
"delayed_unassigned_shards": 0
}
We are running without replicas except for a few select Elasticsearch system indices. The ILM is set to 20 GB or 30 days, with deletion after 60 days.
From what I understand this should not in any way be able to cause the issue that we are seeing.
The status is yellow because the periodic snapshot is failing. This might be the cause or at least have the same root cause.
The snapshot supposedly fails because there are 279 shard failures (primary shard is not allocated), but from what I can see this is not true..
We did do a downgrade of the "hot" node around the time when the snapshot issue started. Some days later the node restarted because it ran out of memory at which point the shards were in fact unavailable.
We tried to upgrade the node again, but it failed, however every time we tried the number of unassigned shards went down, according to the overview, but not the snapshot menu.
(Side question I couldn't find a way to get the shards assigned without attempting to change the config, can anyone tell me how I could have done it?)
It seems that the cluster has ended up in a weird state where everything seems to be working except for the snapshot the growing number of tasks..
Please help me learn and figure out how to fix this.