It seems like you have a problem with heap pressure and that garbage collection is taking a very long time, which is causing problems. Can you please provide the full output of the cluster stats API? Can you also check if the VMs used are deployed with over-provisioned CPU or use memory balooning as this can cause very slow GC? It is very important for Elasticsearch to always have access to the memory allocated to it (heap and off heap) and that this is not swapped out to disk.
1 Like