We are having issues with 7.4.1 and the data nodes getting backlogs on the management threadpool and then all cluster communication stops. Then just noticed that in the documentation https://www.elastic.co/guide/en/elasticsearch/reference/7.4/modules-threadpool.html the management thread pool is mentioned in 7.4 and gone in 7.5. Is cluster management completely different in 7.5? That would make it worth looking at.
Right now if we index more then say 20k documents/second we run into this backlog in cluster management.