Subject: In Kibana, MetricBeat System overview dashboard fails to display data with timeout error message
Hi,
I have the following Elasticsearch and Kibana setup on an Ubuntu box.
OS: Ubuntu 16.10 (yakkety)
Elasticsearch version: 6.2.3
Kibana version : 6.2.3
metricbeat version: 6.2.3
RAM on Ubuntu: 24 GB
RAM allocated to Elasticsearch: 12 GB
CPU Cores: 8
In Elasticsearch a new index is created daily and this index gets continuous performance data from metricbeat .
Imported metricbeat dashboard as suggested by elasticsearch documentation in Kibana.
Now, when I go to dashbards section in Kibana and click on "System Overview" dashboard for 1 year , the dashboard does not show any data.
Elasticsearch performance decreases considerably. even querying elasticsearch host:9200 takes some time.
I get the following error in elasticsearch.log
2019-02-12T04:32:06,350][DEBUG][o.e.a.s.TransportSearchAction] [AB6Tx6q] [metricbeat-2019.02.12][4], node[AB6Tx6qiS1uQC1YB1mWMQg], [P], s[STARTED], a[id=_pYPGRx-Rgy_jfGWezmYmw]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[metricbeat-*], indicesOptions=IndicesOptions[id=39, ignore_unavailable=true, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false], types=, routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=5, batchedReduceSize=512, preFilterShardSize=42, source={"size":0,"query":{"bool":{"must":[{"range":{"@timestamp":{"from":1549963071912,"to":1549963971912,"include_lower":true,"include_upper":true,"format":"epoch_millis","boost":1.0}}},{"bool":{"must":[{"query_string":{"query":"beat.name:"elk-server"","fields":,"type":"best_fields","default_operator":"or","max_determinized_states":10000,"enable_position_increments":true,"fuzziness":"AUTO","fuzzy_prefix_length":0,"fuzzy_max_expansions":50,"phrase_slop":0,"escape":false,"auto_generate_synonyms_phrase_query":true,"fuzzy_transpositions":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}}],"adjust_pure_negative":true,"boost":1.0}},"aggregations":{"d3c67db1-1b1a-11e7-b09e-037021c4f8df":{"filter":{"match_all":{"boost":1.0}},"aggregations":{"timeseries":{"date_histogram":{"field":"@timestamp","time_zone":"Asia/Kolkata","interval":"10s","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":0,"extended_bounds":{"min":1549963071912,"max":1549963971912}},"aggregations":{"d3c67db2-1b1a-11e7-b09e-037021c4f8df":{"max":{"field":"system.diskio.read.bytes"}},"f55b9910-1b1a-11e7-b09e-037021c4f8df":{"derivative":{"buckets_path":["d3c67db2-1b1a-11e7-b09e-037021c4f8df"],"gap_policy":"skip","unit":"1s"}},"dcbbb100-1b93-11e7-8ada-3df93aab833e":{"bucket_script":{"buckets_path":{"value":"f55b9910-1b1a-11e7-b09e-037021c4f8df[normalized_value]"},"script":{"source":"params.value > 0 ? params.value : 0","lang":"painless"},"gap_policy":"skip"}}}}}}}}}] lastShard [true]
org.elasticsearch.transport.RemoteTransportException: [AB6Tx6q][:9300][indices:data/read/search[phase/query]]
Caused by:
org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.common.util.concurrent.TimedRunnable@1881ae0 on QueueResizingEsThreadPoolExecutor[name = AB6Tx6q/search, queue capacity = 1000, min queue capacity = 1000, max queue capacity = 1000, frame size = 2000, targeted response rate = 1s, task execution EWMA = 386nanos, adjustment amount = 50, org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor@56b3477[Running, pool size = 13, active threads = 10, queued tasks = 1527, completed tasks = 529703]]
at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:48) ~[elasticsearch-6.2.3.jar:6.2.3]
at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) ~[?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) ~[?:1.8.0_131]
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.doExecute(EsThreadPoolExecutor.java:98) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.common.util.concurrent.QueueResizingEsThreadPoolExecutor.doExecute(QueueResizingEsThreadPoolExecutor.java:88) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:93) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.search.SearchService.lambda$rewriteShardRequest$0(SearchService.java:994) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.action.ActionListener$1.onResponse(ActionListener.java:60) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:113) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.index.query.Rewriteable.rewriteAndFetch(Rewriteable.java:86) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.search.SearchService.rewriteShardRequest(SearchService.java:992) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:312) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:372) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.action.search.SearchTransportService$6.messageReceived(SearchTransportService.java:369) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:66) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.transport.TransportService.sendLocalRequest(TransportService.java:650) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.transport.TransportService.access$000(TransportService.java:77) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.transport.TransportService$3.sendRequest(TransportService.java:138) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.transport.TransportService.sendRequestInternal(TransportService.java:598) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:518) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.transport.TransportService.sendChildRequest(TransportService.java:558) ~[elasticsearch-6.2.3.jar:6.2.3]
at org.elasticsearch.transport.TransportService.sendChildRequest(TransportService.java:549) ~[elasticsearch-6.2.3.jar:6.2.3]
I have tried changing the number of queues from 1000 to 10,000 by doing the following setting in elasticsearch.yml file.
thread_pool.search.queue_size: 10000
But even this did not help in any way to resolve the issue.
Is there any other solution apart from increasing CPU which i found on couple of places, Could you please suggest what steps can be taken to resolve this issue?