Kibana's stack-monitoring partly broken after stack upgrade from 7.13 to 7.15

heikis · October 8, 2021, 6:42am

Hello. We upgraded our Elastic stack (Elasticsearch nodes, logstash, beats, kibana) from 7.13 to 7.15.0.
Everything is working well except that in Kibana, Stack Monitoring (Cluster Overview) we are getting some weird timeouts when trying to view the Elasticsearch "Overview" of a time-period longer than 30 minutes (the page that shows the overview stats of search rate and latency, indexing rate and latency). The page just keeps loading but no content is shown (header, banner and menu of Kibana GUI is only shown with "Loading..." text).
Last 15 minutes or Last 30 minutes i okay, but a longer period than that and it is not working anymore.

Everything else under Cluster Overview is working- I can see Elasticsearch nodes or any other overview of any time period under Cluster overview.

Usually Kibana GUI shows a message after some time of waiting:

Monitoring Request Error
An HTTP request has failed to connect. Please check if the Kibana server is running and that your browser has a working connection, or contact your system administrator.

Kibana logs constantly give an error during waiting for page to load:

{"type":"log","@timestamp":"2021-10-08T09:06:26+03:00","tags":["error","plugins","monitoring","monitoring"],"pid":598389,"message":"TimeoutError: Request timed out\n    at ClientRequest.onTimeout (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/Connection.js:110:16)\n    at ClientRequest.emit (events.js:400:28)\n    at TLSSocket.emitRequestTimeout (_http_client.js:790:9)\n    at Object.onceWrapper (events.js:519:28)\n    at TLSSocket.emit (events.js:412:35)\n    at TLSSocket.Socket._onTimeout (net.js:484:8)\n    at listOnTimeout (internal/timers.js:557:17)\n    at processTimers (internal/timers.js:500:7) {\n  meta: {\n    body: null,\n    statusCode: null,\n    headers: null,\n    meta: {\n      context: null,\n      request: [Object],\n      name: 'elasticsearch-js',\n      connection: [Object],\n      attempts: 3,\n      aborted: false\n    }\n  }\n}"}

Any tips/ideas what might be wrong? Should I investigate relaxing some timeout timers? We did not have this problem on 7.13. Thanks!

heikis · October 12, 2021, 10:51am

After deleting the .monitoring-es* indexes I could then set the "Elasticsearch overview" time period to 24h no problem - although no historic data as I just deleted it. But over night the new .monitoring-es-* index has grown to several gigabytes and the stack monitoring page again hangs if requesting for Elasticsearch overview of longer than 30minutes.

donasmello · October 28, 2021, 11:48am

Experiencing the same issue. I am able to query other sections of the monitoring page very quickly - Elasticsearch nodes, indices, logstash overview nodes, pipelines, kibana overview, instances. Only the overview page for Elasticsearch monitoring is hanging up when querying any length of time greater than 15 minutes. Searching an hour back takes about 30 seconds to complete. However, searching 24 hours back or 7 days back on a nodes page takes about 500ms and 3 seconds respectively. Query from slowlog:

{"type": "index_search_slowlog", "timestamp": "2021-10-28T06:53:00,460-05:00", "level": "WARN", "component": "i.s.s.fetch", "cluster.name": "monitoring", "node.name": "monitoring_node", "message": "[.monitoring-es-7-mb-2021.10.28][0]", "took": "26.2s", "took_millis": "26280", "total_hits": "-1", "types": "[]", "stats": "[]", "search_type": "QUERY_THEN_FETCH", "total_shards": "9", "source": "{\"size\":10000,\"query\":{\"bool\":{\"filter\":[{\"bool\":{\"should\":[{\"term\":{\"type\":{\"value\":\"index_recovery\",\"boost\":1.0}}},{\"match_none\":{\"boost\":1.0}}],\"adjust_pure_negative\":true,\"boost\":1.0}},{\"term\":{\"cluster_uuid\":{\"value\":\"LnaLeytMQsCe8oJRW1QHrQ\",\"boost\":1.0}}},{\"range\":{\"timestamp\":{\"from\":1635418354072,\"to\":1635421954072,\"include_lower\":true,\"include_upper\":true,\"format\":\"epoch_millis\",\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"_source\":{\"includes\":[\"elasticsearch.index.recovery\",\"@timestamp\"],\"excludes\":[]},\"sort\":[{\"timestamp\":{\"order\":\"desc\",\"unmapped_type\":\"long\"}}],\"aggregations\":{\"max_timestamp\":{\"max\":{\"field\":\"@timestamp\"}}}}", "id": "71586111-9e9e-42e2-a875-7f1666bbccdb", "cluster.uuid": "EKbvKS5eQU6HBS2Hju_I3A", "node.id": "2QnFXpV6SxuX4_Q0F1fRxw"  }

spiqueras · November 17, 2021, 1:10pm

Same issue here, using ES 7.14. All monitoring pages work fine but the Overview page.

donasmello · December 7, 2021, 10:26pm

I have a case open with elastic support to try to resolve this issue for me. Will relay any information that comes of that here.

Sandra · December 9, 2021, 12:48am

The slowness you are experiencing is most likely an issue introduced in 7.14 where the .monitoring indices are being queried unnecessarily and we hope to have a fix in 7.16.2.

donasmello · December 16, 2021, 2:20pm

Thanks for the information Sandra! I have actually just upgraded our monitoring node to 7.16.1 and it resolved the issue.

Sandra · December 16, 2021, 2:40pm

The change did go out in 7.16.1. Glad to hear it!

heikis · December 17, 2021, 9:54am

Upgraded to 7.16.1 and problem solved Thank you Sandra for the hint!

system · January 14, 2022, 9:55am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Monitoring Request Failed Kibana elastic-stack-monitoring	3	2540	February 2, 2021
[Kibana] Monitoring Request Error Kibana	2	819	December 16, 2019
Kibana monitoring page timeout Kibana	3	990	March 1, 2019
Kibana interfaces took lots of time, restart the instance verything works fine Kibana	3	328	December 12, 2019
Stack monitoring is broken in 7.9.1 Kibana elastic-stack-monitoring	5	627	October 9, 2020

Kibana's stack-monitoring partly broken after stack upgrade from 7.13 to 7.15

Related topics