Kibana's stack-monitoring partly broken after stack upgrade from 7.13 to 7.15

Hello. We upgraded our Elastic stack (Elasticsearch nodes, logstash, beats, kibana) from 7.13 to 7.15.0.
Everything is working well except that in Kibana, Stack Monitoring (Cluster Overview) we are getting some weird timeouts when trying to view the Elasticsearch "Overview" of a time-period longer than 30 minutes (the page that shows the overview stats of search rate and latency, indexing rate and latency). The page just keeps loading but no content is shown (header, banner and menu of Kibana GUI is only shown with "Loading..." text).
Last 15 minutes or Last 30 minutes i okay, but a longer period than that and it is not working anymore.

Everything else under Cluster Overview is working- I can see Elasticsearch nodes or any other overview of any time period under Cluster overview.

Usually Kibana GUI shows a message after some time of waiting:

Monitoring Request Error
An HTTP request has failed to connect. Please check if the Kibana server is running and that your browser has a working connection, or contact your system administrator.

Kibana logs constantly give an error during waiting for page to load:

{"type":"log","@timestamp":"2021-10-08T09:06:26+03:00","tags":["error","plugins","monitoring","monitoring"],"pid":598389,"message":"TimeoutError: Request timed out\n    at ClientRequest.onTimeout (/usr/share/kibana/node_modules/@elastic/elasticsearch/lib/Connection.js:110:16)\n    at ClientRequest.emit (events.js:400:28)\n    at TLSSocket.emitRequestTimeout (_http_client.js:790:9)\n    at Object.onceWrapper (events.js:519:28)\n    at TLSSocket.emit (events.js:412:35)\n    at TLSSocket.Socket._onTimeout (net.js:484:8)\n    at listOnTimeout (internal/timers.js:557:17)\n    at processTimers (internal/timers.js:500:7) {\n  meta: {\n    body: null,\n    statusCode: null,\n    headers: null,\n    meta: {\n      context: null,\n      request: [Object],\n      name: 'elasticsearch-js',\n      connection: [Object],\n      attempts: 3,\n      aborted: false\n    }\n  }\n}"}

Any tips/ideas what might be wrong? Should I investigate relaxing some timeout timers? We did not have this problem on 7.13. Thanks!

After deleting the .monitoring-es* indexes I could then set the "Elasticsearch overview" time period to 24h no problem - although no historic data as I just deleted it. But over night the new .monitoring-es-* index has grown to several gigabytes and the stack monitoring page again hangs if requesting for Elasticsearch overview of longer than 30minutes.

Experiencing the same issue. I am able to query other sections of the monitoring page very quickly - Elasticsearch nodes, indices, logstash overview nodes, pipelines, kibana overview, instances. Only the overview page for Elasticsearch monitoring is hanging up when querying any length of time greater than 15 minutes. Searching an hour back takes about 30 seconds to complete. However, searching 24 hours back or 7 days back on a nodes page takes about 500ms and 3 seconds respectively. Query from slowlog:

{"type": "index_search_slowlog", "timestamp": "2021-10-28T06:53:00,460-05:00", "level": "WARN", "component": "i.s.s.fetch", "cluster.name": "monitoring", "node.name": "monitoring_node", "message": "[.monitoring-es-7-mb-2021.10.28][0]", "took": "26.2s", "took_millis": "26280", "total_hits": "-1", "types": "[]", "stats": "[]", "search_type": "QUERY_THEN_FETCH", "total_shards": "9", "source": "{\"size\":10000,\"query\":{\"bool\":{\"filter\":[{\"bool\":{\"should\":[{\"term\":{\"type\":{\"value\":\"index_recovery\",\"boost\":1.0}}},{\"match_none\":{\"boost\":1.0}}],\"adjust_pure_negative\":true,\"boost\":1.0}},{\"term\":{\"cluster_uuid\":{\"value\":\"LnaLeytMQsCe8oJRW1QHrQ\",\"boost\":1.0}}},{\"range\":{\"timestamp\":{\"from\":1635418354072,\"to\":1635421954072,\"include_lower\":true,\"include_upper\":true,\"format\":\"epoch_millis\",\"boost\":1.0}}}],\"adjust_pure_negative\":true,\"boost\":1.0}},\"_source\":{\"includes\":[\"elasticsearch.index.recovery\",\"@timestamp\"],\"excludes\":[]},\"sort\":[{\"timestamp\":{\"order\":\"desc\",\"unmapped_type\":\"long\"}}],\"aggregations\":{\"max_timestamp\":{\"max\":{\"field\":\"@timestamp\"}}}}", "id": "71586111-9e9e-42e2-a875-7f1666bbccdb", "cluster.uuid": "EKbvKS5eQU6HBS2Hju_I3A", "node.id": "2QnFXpV6SxuX4_Q0F1fRxw"  }
1 Like

Same issue here, using ES 7.14. All monitoring pages work fine but the Overview page.

I have a case open with elastic support to try to resolve this issue for me. Will relay any information that comes of that here.

1 Like

The slowness you are experiencing is most likely an issue introduced in 7.14 where the .monitoring indices are being queried unnecessarily and we hope to have a fix in 7.16.2.

1 Like

Thanks for the information Sandra! I have actually just upgraded our monitoring node to 7.16.1 and it resolved the issue.

The change did go out in 7.16.1. Glad to hear it!

1 Like

Upgraded to 7.16.1 and problem solved :slight_smile: Thank you Sandra for the hint!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.