ELK 7.3 monitoring show "No items found" for certain time range

YOULYU_ZHANG · August 14, 2019, 3:43am

We've build a new ES 7.3 cluster, and then notice the "Nodes" page does not correctly show data for certain time range.

Question: I want to confirm this is a known bug or this is due to our setup mistake.

All version is compatible (7.3)
Index exist. (See picture)

chrisronline · August 14, 2019, 2:02pm

It looks like your nodes might have had a lapse in reporting, based on what I'm seeing in this screenshots.

We can verify this by running the following query. You'll need to adjust the time range based on when you are and not are not seeing the data. By default, I have it running one day back (which will generate a lot of results in the response):"gte": "now-1d".

If we see lapses of time where there are no reported documents, then my best guess is something happened with those nodes and they were not reporting their monitoring data properly. Have you checked the logs for any errors?

POST .monitoring-es-*/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "type": "node_stats"
          }
        },
        {
          "range": {
            "timestamp": {
              "gte": "now-1d"
            }
          }
        }
      ]
    }
  },
  "aggs": {
    "nodes": {
      "terms": {
        "field": "node_stats.node_id",
        "size": 10
      },
      "aggs": {
        "timestamp": {
          "terms": {
            "field": "timestamp",
            "size": 100
          }
        }
      }
    }
  }
}

YOULYU_ZHANG · August 15, 2019, 4:44am

Thank you for your hard work to reply.

I should also mention that the data seems correct, I've added screenshot that shows there is no missing data (at least there are data), and another screenshot that shows there is no error transmitting metric data (I've also read log, and didn't find clue).

ES and Kibana is running inside container from docker.elastic.co.

I mean If the data is missed, then the time range that we got "No items found" should shift.
but the time window of "No items found" and working correctly is always same, never shift.

I've investigated 1 to 120 minutes and find following per-minutes observation and it never change.

"No item found" time range example:

(13 ... 24) minutes ago to now
(36 ... 120) minutes ago to now

Working correctly time range example:

(1 ... 12) minutes ago to now
(25 ... 35) minutes ago to now

Notice numbers are related to 12.

If you didn't observe this phenomenon on other cluster instance, then it probably due to some error occur on our setup... I'll investigate.

By the way this is the reply of your query, I couldn't find suspicious data inside...

gist.github.com

https://gist.github.com/lvergergsk/a7ee7c192da16cc28b3a57f43c22e59f

gistfile1.txt

{
  "took" : 158,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 3,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {

This file has been truncated. show original

YOULYU_ZHANG · August 15, 2019, 8:47am

I got this exception when query with problematic time range:

gist.github.com

https://gist.github.com/lvergergsk/ec79245bc372ab39543f3b1e7364d52a

gistfile1.txt

[0], node[DfQYHMjuR2GNM2XqgWVOaw], [P], s[STARTED], a[id=LcOo-TirRW2AqZ-kONP1rQ]: Failed to execute [SearchRequest{searchType=QUERY_THEN_FETCH, indices=[.monitoring-es-7-2019.08.15, .monitoring-es-7-2019.08.14, .monitoring-es-7-2019.08.13], indicesOptions=IndicesOptions[ignore_unavailable=true, allow_no_indices=true, expand_wildcards_open=true, expand_wildcards_closed=false, allow_aliases_to_multiple_indices=true, forbid_closed_indices=true, ignore_aliases=false, ignore_throttled=true], types=[ ], routing='null', preference='null', requestCache=null, scroll=null, maxConcurrentShardRequests=0, batchedReduceSize=512, preFilterShardSize=128, allowPartialSearchResults=true, localClusterAlias=null, getOrCreateAbsoluteStartMillis=-1, ccsMinimizeRoundtrips=true, source={"size":10000,"query":{"bool":{"filter":[{"term":{"type":{"value":"node_stats","boost":1.0}}},{"term":{"cluster_uuid":{"value":"y7BPLPM4TCGLvf9Fm3z61g","boost":1.0}}},{"range":{"timestamp":{"from":1565854760575,"to":1565858360575,"include_lower":true,"include_upper":true,"format":"epoch_millis","boost":1.0}}}],"adjust_pure_negative":true,"boost":1.0}},"sort":[{"timestamp":{"order":"desc"}}],"aggregations":{"nodes":{"terms":{"field":"source_node.uuid","size":10000,"min_doc_count":1,"shard_min_doc_count":0,"show_term_doc_count_error":false,"order":[{"_count":"desc"},{"_key":"asc"}]},"aggregations":{"node_cgroup_quota":{"date_histogram":{"field":"timestamp","interval":"30s","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":1},"aggregations":{"usage":{"max":{"field":"node_stats.os.cgroup.cpuacct.usage_nanos"}},"periods":{"max":{"field":"node_stats.os.cgroup.cpu.stat.number_of_elapsed_periods"}},"quota":{"min":{"field":"node_stats.os.cgroup.cpu.cfs_quota_micros"}},"usage_deriv":{"derivative":{"buckets_path":["usage"],"gap_policy":"skip","unit":"1s"}},"periods_deriv":{"derivative":{"buckets_path":["periods"],"gap_policy":"skip","unit":"1s"}}}},"node_cgroup_throttled":{"date_histogram":{"field":"timestamp","interval":"30s","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":1},"aggregations":{"metric":{"max":{"field":"node_stats.os.cgroup.cpu.stat.time_throttled_nanos"}},"metric_deriv":{"derivative":{"buckets_path":["metric"],"gap_policy":"skip","unit":"1s"}}}},"node_cpu_utilization":{"date_histogram":{"field":"timestamp","interval":"30s","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":1},"aggregations":{"metric":{"max":{"field":"node_stats.process.cpu.percent"}},"metric_deriv":{"derivative":{"buckets_path":["metric"],"gap_policy":"skip","unit":"1s"}}}},"node_load_average":{"date_histogram":{"field":"timestamp","interval":"30s","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":1},"aggregations":{"metric":{"max":{"field":"node_stats.os.cpu.load_average.1m"}},"metric_deriv":{"derivative":{"buckets_path":["metric"],"gap_policy":"skip","unit":"1s"}}}},"node_jvm_mem_percent":{"date_histogram":{"field":"timestamp","interval":"30s","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":1},"aggregations":{"metric":{"max":{"field":"node_stats.jvm.mem.heap_used_percent"}},"metric_deriv":{"derivative":{"buckets_path":["metric"],"gap_policy":"skip","unit":"1s"}}}},"node_free_space":{"date_histogram":{"field":"timestamp","interval":"30s","offset":0,"order":{"_key":"asc"},"keyed":false,"min_doc_count":1},"aggregations":{"metric":{"max":{"field":"node_stats.fs.total.available_in_bytes"}},"metric_deriv":{"derivative":{"buckets_path":["metric"],"gap_policy":"skip","unit":"1s"}}}}}}},"collapse":{"field":"source_node.uuid"}}}]

Then I tried this Request:

gist.github.com

https://gist.github.com/lvergergsk/28e20381b8f9978d69f7e0bc793d7d05

gistfile1.txt

POST .monitoring-es-*/_search
{
  "size": 10000,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "type": {
              "value": "node_stats",

This file has been truncated. show original

and get this response:

gist.github.com

https://gist.github.com/lvergergsk/4d4d8ae1d352d33a3ab0b5286fb9303f

gistfile1.txt

#! Deprecation: [interval] on [date_histogram] is deprecated, use [fixed_interval] or [calendar_interval] in the future.
{
  "took" : 198,
  "timed_out" : false,
  "_shards" : {
    "total" : 3,
    "successful" : 2,
    "skipped" : 0,
    "failed" : 1,
    "failures" : [

This file has been truncated. show original

Saying that Trying to create too many buckets.
I tried to modify xpack.monitoring.max_bucket_size from 10000 to 5000, and it does not work.
Going to investigate further.

YOULYU_ZHANG · August 15, 2019, 9:46am

I run

PUT /_cluster/settings
{
"persistent" : {
"search.max_buckets" : 20000
}
}

and the page works properly.

@chrisronline
Could you explain why this is happening.
I mean if I say "size": 10000 in query, then it should not create 10109 bucket, but it did...

Does that mean I have other configuration error?...
Also I suggest Kibana shouldn't act as there is no Node, it should report error to user if the monitoring query is failed.

chrisronline · August 15, 2019, 5:50pm

Can you run this and report back on the response:

POST .monitoring-es-*/_search
{
  "size": 10000,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "type": {
              "value": "node_stats",
              "boost": 1.0
            }
          }
        },
        {
          "term": {
            "cluster_uuid": {
              "value": "y7BPLPM4TCGLvf9Fm3z61g",
              "boost": 1.0
            }
          }
        },
        {
          "range": {
            "timestamp": {
              "from": 1565855761862,
              "to": 1565859361862,
              "include_lower": true,
              "include_upper": true,
              "format": "epoch_millis",
              "boost": 1.0
            }
          }
        }
      ],
      "adjust_pure_negative": true,
      "boost": 1.0
    }
  },
  "sort": [
    {
      "timestamp": {
        "order": "desc"
      }
    }
  ],
  "collapse": {
    "field": "source_node.uuid"
  }
}

chrisronline · August 15, 2019, 7:15pm

BTW, there is an open issue tracking this as well: https://github.com/elastic/kibana/issues/36892

YOULYU_ZHANG · August 16, 2019, 2:46am

Thanks for the link to related issue.

Here is the response whenmax_bucket is 20000

gist.github.com

https://gist.github.com/lvergergsk/8e8fb342b31e61b09a022efb433c3d1b

gistfile1.txt

{
  "took" : 934,
  "timed_out" : false,
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {

This file has been truncated. show original

and this is the response when mab_bucket is set to null.
(Default is 10000 if I remember correctly.)

gist.github.com

https://gist.github.com/lvergergsk/9ced294039778064d407e5026788ab5b

gistfile1.txt

{
  "took" : 326,
  "timed_out" : false,
  "_shards" : {
    "total" : 4,
    "successful" : 4,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {

This file has been truncated. show original

chrisronline · August 16, 2019, 1:34pm

Hi @YOULYU_ZHANG,

So we discovered a nice bug, mentioned here, that I think will fix this issue up for you. I'm working on a PR today and it should be available in a release soon! Please track the issue for updates

system · September 13, 2019, 1:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Not seeing any data in the monitoring tab Kibana elastic-stack-monitoring	47	1986	June 8, 2019
All nodes except Master show as “Offline” Kibana elastic-stack-monitoring	12	1675	February 20, 2020
Elasticsearch monitoring stopped abruptly Elasticsearch elastic-stack-monitoring	5	594	June 30, 2020
Nodes fail to list Kibana elastic-stack-monitoring	7	341	September 5, 2019
Kibana monitoring errors Kibana elastic-stack-monitoring	4	858	March 19, 2019

ELK 7.3 monitoring show "No items found" for certain time range

Related topics