Elastic Cluster Went Down

Hi Team,

We have cluster on cloud.elastic.co, which went down due to JVM heap memory pressure(My assumptions).

Following is my cluster configuration:

  • 2 Data Nodes(8GB for each)
  • 1 Master Node(8GB)

Cluster was not able to process the write requests which were sent from our application with timeout of 30 seconds and all the requests were timedout. (According to cloud.elastic support team) They found our cluster in unhealthy state and resatrted the node but nothing happend still the write requests were getting the timedout requests.

Not sure what happened here? Any one ever faced such issues?

Thanks,

How many indices and shards do you have in the cluster? How many of these are you actively indexing into? What is the size of your bulk requests?

Hi @Christian_Dahlqvist,

We have only 1 index with 5 shards. All of the shards are actively used and we have write heavy application. We do not have bulk requests on any write operation.

Ok. That does eliminate a common source of problems. Indexing and updating a lot of single documents can be quite inefficient and cause a lot of disk I/O as the translog will be synced for each request.

So this might be same applicable to Bulk Requests as well?

Also, I am not sure why the write requests were getting timeout while the read requests were able to serve.

We tried writing single object manually from Kibana that did work but not from our application. This is confusing part as from Kibana writes are working but not from the application.(We are using node elasticsearch).

Thanks,

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.