After cluster reboot kibana indices cannot be recovered, I see the following entries in log:
[2024-05-28T05:15:49,289][DEBUG][o.e.g.G.InternalPrimaryShardAllocator] [sd-07a1-ae2b.local] [[.kibana_2/4sWnjATIQ7WDrOTqWYr7fA]][0]: found 0 allocation candidates of [.kibana_2][0], node[null], [P], recovery_source[existing store recovery; bootstrap_history_uuid=false], s[UNASSIGNED], unassigned_info[[reason=MANUAL_ALLOCATION], at[2024-05-26T00:42:05.578Z], delayed=false, details[failed shard on node [4YnGQHt5SvKcQpdtROrImA]: failed recovery, failure RecoveryFailedException[[.kibana_2][0]: Recovery failed on {sd-c3d2-b7ca.local}{4YnGQHt5SvKcQpdtROrImA}{LaA75lNsQgac6B-x627_Uw}{sd-c3d2-b7ca.local}{10.115.205.88:9300}{ml.machine_memory=67387195392, xpack.installed=true, ml.max_open_jobs=20, ml.enabled=true}]; nested: IndexShardRecoveryException[failed to recover from gateway]; nested: FileSystemException[/opt/gicapods/elasticdata/nodes/0/indices/4sWnjATIQ7WDrOTqWYr7fA/0/_state: **Too many open files**]; ], allocation_status[no_valid_shard_copy]] based on allocation ids: [[DmpGyU8ZSE6sGJJrEzRelw]]
[2024-05-28T05:15:49,289][DEBUG][o.e.g.G.InternalPrimaryShardAllocator] [sd-07a1-ae2b.local] [[.kibana_2/4sWnjATIQ7WDrOTqWYr7fA]][0]: not allocating, number_of_allocated_shards_found [0]
I've increased open files limit to 1 million in limits.conf
and applied the same to elasticsearch process via prlimit
command.
lsof
shows me that ~300k files are currently opened by elasticsearch process, however I'm still seeing the above message in log.
Is this issue really has something to do with open files?
2 nodes cluster, version 6.8