UNASSIGNED ALLOCATION_FAILED

rnappert · April 24, 2020, 12:01pm

I have a Elasticsearch 6.2.3 setup (single node). The system was running smoothly for a period of time. Needed to expand the hard-drive, since the disk space was about to run out. After the expansion, Elasticsearch came up, but after a period of time, the health status turned red:
curl -s 'localhost:9200/_cluster/health?pretty'
{
"cluster_name" : "ClusterX",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 880,
"active_shards" : 880,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 15,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 98.32402234636871
}

The reason is UNASSIGNED ALLOCATION_FAILED:
curl -XGET localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason| grep UNASSIGNED
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 53499 100 53499 0 0 17127 0 0:00:03 0:00:03 --:--:-- 17130
indexA-2020.03.12 1 p UNASSIGNED ALLOCATION_FAILED
indexA-2020.03.12 4 p UNASSIGNED ALLOCATION_FAILED
...
_cluster/allocation call says that too many files are open:
curl -XGET localhost:9200/_cluster/allocation/explain?pretty
{
"index" : "indexA-2020.03.12",
"shard" : 4,
"primary" : true,
"current_state" : "unassigned",
"unassigned_info" : {
"reason" : "ALLOCATION_FAILED",
"at" : "2020-04-22T10:40:10.827Z",
"failed_allocation_attempts" : 5,
"details" : "failed shard on node [MXMUEqXRTxKfiSQCeQQvTg]: failed recovery, failure RecoveryFailedException[[indexA-2020.03.12][4]: Recovery failed on {MXMUEqX}{MXMUEqXRTxKfiSQCeQQvTg}{wsB9s7OWSRq7zLXX26d5NQ}{127.0.0.1}{127.0.0.1:9300}{rack_id=r1}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/opt/elasticsearch/nodes/0/indices/2G7F7nCQT7y_yMCybVV9xQ/4/translog/translog-251.ckp: Too many open files]; ",
"last_allocation_status" : "no"
},
"can_allocate" : "no",
"allocate_explanation" : "cannot allocate because allocation is not permitted to any of the nodes that hold an in-sync shard copy",
"node_allocation_decisions" : [
{
"node_id" : "MXMUEqXRTxKfiSQCeQQvTg",
"node_name" : "MXMUEqX",
"transport_address" : "127.0.0.1:9300",
"node_attributes" : {
"rack_id" : "r1"
},
"node_decision" : "no",
"store" : {
"in_sync" : true,
"allocation_id" : "_YsZ8lYWT069lYzts-Eg2A"
},
"deciders" : [
{
"decider" : "max_retry",
"decision" : "NO",
"explanation" : "shard has exceeded the maximum number of retries [5] on failed allocation attempts - manually call [/_cluster/reroute?retry_failed=true] to retry, [unassigned_info[[reason=ALLOCATION_FAILED], at[2020-04-22T10:40:10.827Z], failed_attempts[5], delayed=false, details[failed shard on node [MXMUEqXRTxKfiSQCeQQvTg]: failed recovery, failure RecoveryFailedException[[indexA-2020.03.12][4]: Recovery failed on {MXMUEqX}{MXMUEqXRTxKfiSQCeQQvTg}{wsB9s7OWSRq7zLXX26d5NQ}{127.0.0.1}{127.0.0.1:9300}{rack_id=r1}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/opt/elasticsearch/nodes/0/indices/2G7F7nCQT7y_yMCybVV9xQ/4/translog/translog-251.ckp: Too many open files]; ], allocation_status[deciders_no]]]"
}
]
}
]
}

Once, I truncated the tanslog with elasticsearch-truncate, the system comes up, but again, after a while, the same issue occurs, index from a different date.
All those dates are before the date, when the partition was expanded.
Any idea, how to permanently fix this?

Thanks

dadoonet · April 24, 2020, 1:36pm

Welcome!

You probably have too many shards per node.

May I suggest you look at the following resources about sizing:

https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

And https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right

Also this message

Too many open files

Seems to indicate that as well. Make sure you have enough file descriptors available. See https://www.elastic.co/guide/en/elasticsearch/reference/current/file-descriptors.html

rnappert · April 24, 2020, 1:47pm

I have 65536 fd configured.
curl -X GET "localhost:9200/_nodes/stats/process?filter_path=**.max_file_descriptors&pretty"
{
"nodes" : {
"U45GBT3kRo-Z1pj8mxDr5Q" : {
"process" : {
"max_file_descriptors" : 65536
}
}
}
}
and I have 6 shards per index.
The really weird part is that it only started occurring after the disk expansion. I also don't understand that it creeps up during the running of Elasticsearch.
Remember, ES is functioning for a while (returns queries). Then, for some reason that the status switches to red. I can understand this during startup, when ES initiates the shards, but not during normal operation.
Do you think that 6 shards are too many?

dadoonet · April 24, 2020, 2:37pm

You don't have 6 shards. But 880 + 15 pending, so 895 shards allocated to one single node.

rnappert · April 24, 2020, 3:22pm

Sorry, I meant 6 shards per index.

Are 1000+ shards per node an issue?

dadoonet · April 24, 2020, 4:20pm

It depends. But if you look at the resources I shared with you, you will see that we don't recommend having more than 20 shards per Gb of HEAP.
And as we don't recommend using HEAP bigger than 30gb, 600 shards should be a max.

How much HEAP do you have?

rnappert · April 24, 2020, 5:05pm

The heap size is set to 24GB. From that, 900 shards for a single node are too much.
Is it just a coincidence that the issue is seen after the disk expansion? From what you are saying, it almost seems like it.
Is there an easy way to reduce the shards per index?

dadoonet · April 25, 2020, 4:36pm

There's the shrink API.
Otherwise you need to reindex. The reindex API might help.

What is the output of:

GET /
GET /_cat/indices?v

If some outputs are too big, please share them on gist.github.com and link them here.

rnappert · April 25, 2020, 9:40pm

Once I get the info, I will share them at gist.github.com.

The interesting part is that I have a much smaller in scale setup (only 4GB of heap) and this system has 1500 shards (no issues there)
curl -XGET localhost:9200/_cluster/health?pretty
{
"cluster_name" : "ClusterX",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 1501,
"active_shards" : 1501,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}

Granted, in this setup the data stored in ES is really small compared to the system, which has the issue:
Small setup:
curl -XGET 'http://localhost:9200/_cat/allocation?v'
shards disk.indices disk.used disk.avail disk.total disk.percent host ip node
1501 1.8gb 16.9gb 91gb 107.9gb 15 127.0.0.1 127.0.0.1 DB0h-QU

I don't have the disk usage of the other system.

Could the amount of data cause the issue?

Thanks

Christian_Dahlqvist · April 25, 2020, 10:53pm

Have a look at this blog post.

rnappert · April 26, 2020, 5:50pm

See the output of the ES queries:

github.com

nappertr/Elasticsearch/blob/master/Output.txt

curl -XGET 'http://localhost:9200/_cat/allocation?v'
shards disk.indices disk.used disk.avail disk.total disk.percent host      ip        node
   918      339.4gb  1016.4gb     14.7tb     15.7tb            6 127.0.0.1 127.0.0.1 MXMUEqX
    37                                                                               UNASSIGNED



curl -XGET localhost:9200/
{
  "name" : "MXMUEqX",
  "cluster_name" : "ClusterX",
  "cluster_uuid" : "6PWwq_cjQPGDxYShUQFVtw",
  "version" : {
    "number" : "6.3.2",
    "build_flavor" : "oss",
    "build_type" : "tar",
    "build_hash" : "053779d",
    "build_date" : "2018-07-20T05:20:23.451332Z",
    "build_snapshot" : false,
    "lucene_version" : "7.3.1",

This file has been truncated. show original

Thanks

dadoonet · April 30, 2020, 10:17am

You have a lot of "small" indices with 5 shards. Let's take this one as an example:

green  open   app_3-2020.04.12                           c4Q2Ik3YTDSYXbWkiXMBSw   6   0     736972            0     44.2mb         44.2mb

6 shards for 44.2mb is a waste of ressources. It's like having 6 MySQL database running on the same physical machine to hold 44 mb of data...

The biggest one seems to be:

green  open   app_2-2020.04.10                           PPTEt1atTG6MlG3Pl7jqpg   6   0  407957499            0     33.3gb         33.3gb

I think that 1 or 2 shards should be enough to manage them.

So in general, if you reduce all indices from 6 shards to 1 shard, you will end up with around 250 shards for your node which will be probably fine.

Hope this helps.

rnappert · April 30, 2020, 7:34pm

Thanks David.
yes, I agree there are a lot of indices, where we can improve. That helps.
One more question: Having an index with a size of about 300 gb. Are 6 shards fine for those?

dadoonet · April 30, 2020, 8:21pm

Ideal shard max size is from 20 to 50gb depending on your own tests.
It could work but may be you'll need more.

rnappert · May 6, 2020, 5:57pm

The issue was that there were indices with a size of a couple of mb, but there were a lot of files 3k+, created for those. There even was one index, which had 2300 documents, but 3700 files were created. After shrinking this index to just one shard reduced the number of files to less 200 files.
How is it possible that so many files were created for such a small index ?

dadoonet · May 7, 2020, 1:36am

Lucene needs some files to run. Also Lucene has this concept of segments which can be merged when needed or forcely merged if you call the force merge API.

Plus the transaction log...

All that can leads to many files.

Christian_Dahlqvist · May 7, 2020, 5:19am

One reason might’ve that you are creating a lot of small segments by frequently updating the same document or perhaps have scroll queries interfere with merging. How do you use that index?

rnappert · May 7, 2020, 12:13pm

Those indices get updated only for 24 hours. Once an hour a series of queries is performed against a specific keys and a document (one document per key) of the index is updated with the retrieved aggregation. So in that specific example, we only had 2300 documents. Those documents were updated once an hour.

rnappert · May 7, 2020, 12:19pm

Hi David,
I think this is a very good idea to use the _forcemerge API for indices, which are not updated anymore. That could save a lot IO resources in the long run, meaning if there is a requirement to keep data for a very long time (365 d +) for historical reasons.

system · June 4, 2020, 12:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unassigned shards Elasticsearch	6	755	July 6, 2017
Too many open files in shards allocation Elasticsearch	6	4216	September 18, 2018
Too many open files and other problems Elasticsearch	17	2216	July 6, 2017
New index immediately becomes red Elasticsearch	8	2060	July 6, 2017
20 shards per 5 nodes, thoughts Elasticsearch	9	565	July 6, 2017

UNASSIGNED ALLOCATION_FAILED

Related topics