We are experiencing some problems with our clusters. Each night we have a process that will send some updates to our data, trough logstasj(so hitting the bulk api). The updates ar of different size every night, but it can get to a few million documents (documents are quite small, <1kb). Then, in the mornings, we will find most of the nodes on 74% memory usage (so just below the threshold for GC) and the cluster will be really slow (avg search time moves from <0.5s to a few seconds). On our test cluster we use m4.large aws machines and on our prod cluster we use aws m4.xlarge machines, and we found the same problems with both. We've also had the problem with both 50% heap size or smaller values. A restart of the ES process will always fix the problem until the next indexing period. As we are not live with our clusters, we don't have regular search queries, but I dont know how that would be a problem. We are running with standard settings as far as I am aware.
Any advice will be really appreciated.