Kibana 4.4 randomly becomes non-responsive

Hi,

I have 5 kibana instances (v 4.4) fronted by an LB. Every now and then, one instance will become non responsive. When trying with curl directly hitting that host, it waited for a long time and then I got "curl: (52) Empty reply from server". My Kibana instances all connect to its local tribe node. I can confirm all tribe nodes are running fine.

My ES cluster is definitely healthy and in green state. The other 4 Kibana instances are running fine. If I restart the bad kibana instance, it will be back to normal again, so definitely nothing to do with the ES cluster health.

Has anyone else experienced similar issue? The kibana.log does not show me much, I do see the following logs sometimes but not always when the host becomes non responsive:

{"type":"log","@timestamp":"2016-06-08T07:02:33+00:00","tags":["status","plugin:elasticsearch","error"],"pid":1,"name":"plugin:elasticsearch","state":"red","message":"Status changed from red to red - Request Timeout after 1500ms","prevState":"red","prevMsg":"Request Timeout after 30000ms"}

One other thing worth mention is no users are using the kibana yet, I just have a regular 1 minute health check to check the kibana health by looking at the response code through /status.

Thank you in advance for your help.

Do you know what the state of the available memory on the machine is like when this occurs? Wondering if it's paging.

The Kibana is running inside a Docker container and I am not aware of any special configuration with the Docker.
The host machine is X5-2L Oracle Server, which has 503 GB RAM and constantly has 400+ GB left.

It looks like I actually created a ticket for this, but unfortunately it hasn't been resolved. https://github.com/elastic/kibana/issues/6761

So far I have only seen it happen once, which makes it hard to investigate when it's not easily reproducible. Any additional information you could add to the ticket would be appreciated.

It happened to one of my host again this morning. And the last log line is:

{"type":"log","@timestamp":"2016-06-10T12:28:31+00:00","tags":["status","plugin:elasticsearch","error"],"pid":1,"name":"plugin:elasticsearch","state":"red","message":"Status changed from red to red - Request Timeout after 1500ms","prevState":"red","prevMsg":"Request Timeout after 30000ms"}

My cluster is definitely green and other 4 kibana instances are in good shape. I am going to see if I can increase the request timeout and see if that helps.

I upgraded to version 4.5.1 but the issue persists. Any help is appreciated.

I too have run into this issue all the time. Randomly leave Kibana running and it'll disconnect from ElasticSearch with the above error (my Request Timeout is 1500ms). No log messages or errors before it, just random timeout and done.

I tried setting --max-old-space-size suggested in https://github.com/elastic/kibana/issues/5170 and that did not help either.
The process is alive but kibana is still not responsive.

I moved all my kibana instances out of Docker and the issue has not occurred ever since.