Kibana time-out errors

Hello,

I am testing ELasticsearch to deploy it in production cluster. The first days it was working perfectly, and then I began having trouble when I try to search logs using Kibana, and the response was taking a long time, and now it's almost not working at all. I am getting Time out Errors and then Kibana crash.

Here are some errors that I am getting in Kibana:

server    log   [09:24:33.353] [warning][kibana-monitoring][monitoring][monitoring][plugins] Unable to bulk upload the stats payload to the local cluster
server    log   [09:24:35.436] [error][plugins][taskManager] Failed to poll for work: Error: work has timed out
server    log   [09:24:37.741] [error][elasticsearch][taskManager] [TimeoutError]: Request timed out
server    log   [09:24:43.193] [warning][kibana-monitoring][monitoring][monitoring][plugins] Error: [cluster_block_exception] blocked by: [SERVICE_UNAVAILABLE/2/no master];
    at respond (/home/ELK-8_0_0/kibana/node_modules/elasticsearch/src/lib/transport.js:349:15)
    at checkRespForFailure (/home/ELK-8_0_0/kibana/node_modules/elasticsearch/src/lib/transport.js:306:7)
    at HttpConnector.<anonymous> (/home/ELK-8_0_0/kibana/node_modules/elasticsearch/src/lib/connectors/http.js:173:7)
    at IncomingMessage.wrapper (/home/ELK-8_0_0/kibana/node_modules/lodash/lodash.js:4949:19)
    at IncomingMessage.emit (events.js:203:15)
    at endReadableNT (_stream_readable.js:1145:12)
    at process._tickCallback (internal/process/next_tick.js:63:19)
server    log   [09:24:43.193] [warning][kibana-monitoring][monitoring][monitoring][plugins] Unable to bulk upload the stats payload to the local cluster
server    log   [09:24:52.008] [error][index][plugins][security][session] Failed to check if session index template exists: Request Timeout after 30000ms
Unhandled Promise rejection detected:

{ Error: Request Timeout after 30000ms
    at /home/ELK-8_0_0/kibana/node_modules/elasticsearch/src/lib/transport.js:397:9
    at Timeout.<anonymous> (/home/ELK-8_0_0/kibana/node_modules/elasticsearch/src/lib/transport.js:429:7)
    at ontimeout (timers.js:436:11)
    at tryOnTimeout (timers.js:300:5)
    at listOnTimeout (timers.js:263:5)
    at Timer.processTimers (timers.js:223:10)
  status: undefined,
  displayName: 'RequestTimeout',
  message: 'Request Timeout after 30000ms',
  body: false }

Terminating process...
 server crashed  with status code 1

And WArninig in an ELasticsearch node are :

[2020-11-16T09:39:36,599][WARN ][o.e.t.TransportService   ] [MASTER-01] Received response for a request that has timed out, sent [25413ms] ago, timed out [15408ms] ago, action [internal:coordination/fault_detection/leader_check], node [{MASTER-03}{RCeMt0uXQie_ax_Sp22hLw}{ghlsACoWQKSva8mIlCKssQ}{X.X.X.X}{X.X.X.X:9300}{dilmrt}{ml.machine_memory=8365068288, ml.max_open_jobs=20, xpack.installed=true, data=hot, transform.node=true}], id [14671045]
[2020-11-16T09:39:36,600][WARN ][o.e.t.TransportService   ] [MASTER-01] Received response for a request that has timed out, sent [14407ms] ago, timed out [4402ms] ago, action [internal:coordination/fault_detection/leader_check], node [{MASTER-03}{RCeMt0uXQie_ax_Sp22hLw}{ghlsACoWQKSva8mIlCKssQ}{X.X.X.X}{X.X.X.X:9300}{dilmrt}{ml.machine_memory=8365068288, ml.max_open_jobs=20, xpack.installed=true, data=hot, transform.node=true}], id [14671080]
[2020-11-16T09:39:37,824][WARN ][o.e.m.f.FsHealthService  ] [MASTER-01] health check of [/var/lib/ELK-8_0_0] took [12006ms] which is above the warn threshold of [5s]
[2020-11-16T09:40:14,831][WARN ][o.e.t.InboundHandler     ] [MASTER-01] handling inbound transport message [InboundMessage{Header{5844}{8.0.0}{10824701}{true}{false}{false}{false}{indices:data/write/bulk[s][r]}}] took [6203ms] which is above the warn threshold of [5000ms]
[2020-11-16T09:40:14,832][WARN ][o.e.t.InboundHandler     ] [MASTER-01] handling inbound transport message [InboundMessage{Header{5645}{8.0.0}{42084006}{true}{false}{false}{false}{indices:data/read/search[phase/query]}}] took [6203ms] which is above the warn threshold of [5000ms]

Could you give me some advices to solve these errors.
Knowing that I have 5 Elasticsearch nodes: 3 master nodes and 2 data-nodes
and JVM is configured to 4G in each node

Thanks

It seems like this is more of an Elasticsearch problem than a Kibana problem, I moved the post to the other forum.

4GB of RAM is pretty low, maybe your cluster is not able to handle its load? How many documents are you ingesting? How's the memory usage of Elasticsearch? Is it possible there are networking problems among the hosts?

Thanks for your answer @flash1293
So to give you a little bit information about my cluster, I will give you the average of each beat:

Winlogbeat --> 162 doc / 30seconds
Packetbeat --> 5482 doc / 30 seconds
Logstash pipiline --> 5000 doc / 30 seconds

when I run htop in the different machines where I installed Elasticsearch (Installed in Debian 10) I am getting any problem of Memroy.
And I don't think that there is a problem of network between the different nodes as they are all installed in the same LAN.
Hope these information help you to tell me if I am doing something wrong in my cluster.
Thanks again

If you aren't running monitoring yet (https://www.elastic.co/guide/en/elasticsearch/reference/current/monitoring-overview.html), it would probably help giving you some visibility into what's happening inside your cluster. I'm no expert on performance tuning, but all you wrote so far seems to point to performance/capacity issues, so this looks like a good starting point to me.

1 Like

Thanks for your answer,
I will try to add the monitoring to my cluster and I will keep you updated

Hi again,
so I activated the monitoring in my cluster, and even when I stopped all the beats I still have warning of type Time out

An example of the results I got for a node in my cluster is shown below:


and the errors I get in that node are:

[2020-11-25T10:17:05,674][WARN ][o.e.m.f.FsHealthService  ] [VSELK-DATA-01] health check of [/var/lib/elasticsearch] took [5803ms] which is above the warn threshold of [5s]
[2020-11-25T10:36:50,050][WARN ][o.e.t.InboundHandler     ] [VSELK-DATA-01] handling inbound transport message [InboundMessage{Header{738}{8.0.0}{3053761}{true}{false}{false}{false}{indices:data/read/search[can_match]}}] took [6002ms] which is above the warn threshold of [5000ms]
[2020-11-25T10:37:00,588][WARN ][o.e.t.InboundHandler     ] [VSELK-DATA-01] handling inbound transport message [InboundMessage{Header{22820}{8.0.0}{3292365}{true}{false}{false}{false}{indices:data/read/search[phase/query]}}] took [6804ms] which is above the warn threshold of [5000ms]
[2020-11-25T10:37:00,589][WARN ][o.e.t.InboundHandler     ] [VSELK-DATA-01] handling inbound transport message [InboundMessage{Header{22816}{8.0.0}{3292360}{true}{false}{false}{false}{indices:data/read/search[phase/query]}}] took [7489ms] which is above the warn threshold of [5000ms]
[2020-11-25T10:37:02,393][WARN ][o.e.t.InboundHandler     ] [VSELK-DATA-01] handling inbound transport message [InboundMessage{Header{4121}{8.0.0}{3292351}{true}{false}{false}{false}{indices:data/write/bulk[s][r]}}] took [9890ms] which is above the warn threshold of [5000ms]

The overview of all the nodes is:

Thanks for your help

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.