We have cluster on cloud.elastic.co, which went down due to JVM heap memory pressure(My assumptions).
Following is my cluster configuration:
2 Data Nodes(8GB for each)
1 Master Node(8GB)
Cluster was not able to process the write requests which were sent from our application with timeout of 30 seconds and all the requests were timedout. (According to cloud.elastic support team) They found our cluster in unhealthy state and resatrted the node but nothing happend still the write requests were getting the timedout requests.
Not sure what happened here? Any one ever faced such issues?
We have only 1 index with 5 shards. All of the shards are actively used and we have write heavy application. We do not have bulk requests on any write operation.
Ok. That does eliminate a common source of problems. Indexing and updating a lot of single documents can be quite inefficient and cause a lot of disk I/O as the translog will be synced for each request.
So this might be same applicable to Bulk Requests as well?
Also, I am not sure why the write requests were getting timeout while the read requests were able to serve.
We tried writing single object manually from Kibana that did work but not from our application. This is confusing part as from Kibana writes are working but not from the application.(We are using node elasticsearch).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.