@vamsikrishna_medeti Sorry for the late reply.
If a query/process is very heavy (causing OOM errors), the ES Stack Monitoring plugin will start throttling collection rate thus resulting in gaps on the chart. Couple of things we'll need to figure out first.
-
During this period when the chart is showing gaps, are there any logs/errors in the ES/Kibana console?
-
Have you tried identifying the query causing the OOM? (usually the one that is the slowest). You can do this via:
monitoring.elasticsearch.hosts: ["http://localhost:9200"]
monitoring.elasticsearch.logQueries: true
logging.verbose: true
- Try running the output rate query independently and see if it occasionally times out (or if the result also has gaps). Be sure to replace your own
cluster_uuid
:
GET .monitoring-beats-6-*,.monitoring-beats-7-*/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"cluster_uuid": "Q4FBbFszTj6jnCWhBG0Pgw"
}
},
{
"range": {
"beats_stats.timestamp": {
"gte": "now-1h"
}
}
}
]
}
},
"aggs": {
"check": {
"date_histogram": {
"field": "beats_stats.timestamp",
"fixed_interval": "30s"
},
"aggs": {
"metric": {
"max": {
"field": "beats_stats.metrics.libbeat.output.events.total"
}
},
"metric_deriv": {
"derivative": {
"buckets_path": "event_rate",
"gap_policy": "skip",
"unit": "1s"
}
},
"beats_uuids": {
"terms": {
"field": "beats_stats.beat.uuid",
"size": 1
},
"aggs": {
"event_rate_per_beat": {
"max": {
"field": "beats_stats.metrics.libbeat.output.events.total"
}
}
}
},
"event_rate": {
"sum_bucket": {
"buckets_path": "beats_uuids>event_rate_per_beat",
"gap_policy": "skip"
}
}
}
}
}
}
-
One thing to also try is different time ranges, so instead of the default 1h ago try things like 15m or 6h etc. This way we can figure out if it's a max bucket issue
-
This might also be because the cluster resources are under provisioned. Have you tried increasing nodes/memory (JVM)?