I'm trying to create a search (ultimately a visualization) that will give me the total number of Controller nodes in a cluster. Each node reports metrics every 15 seconds, and each document will contain the value for that specific node, with the active controller carrying "1" as the value, and all others "0".
My goal is to get the latest value for each node, and then sum those values. If the result is anything other than 1, that's bad. Obviously, if I get values from more documents than just the last one per each node, the result will be wrong.
This is my current query:
GET metricbeat-8.6.2/_search
{
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "prometheus.metrics.kafka_controller_active_controller_count"
}
}
],
"must": [
{
"terms": {
"host.name": [
"node1",
"node2",
"node3",
"node4"
]
}
}
]
}
},
"aggs": {
"Per_Broker": {
"terms": {
"field": "host.name",
"size": 4
},
"aggs": {
"Last_Value": {
"terms": {
"field": "@timestamp",
"size": 1,
"order": {
"_key": "desc"
}
},
"aggs": {
"Active_Controller": {
"sum": {
"field": "prometheus.metrics.kafka_controller_active_controller_count"
}
}
}
}
}
},
"Number_of_Active_Controllers": {
"sum_bucket": {
"buckets_path": "Per_Broker['Last_Value'].Active_Controller"
}
}
},
"_source": false,
"fields": [
"host.name",
"prometheus.metrics.kafka_controller_active_controller_count"
],
"size": 0
}
The result:
{
"took": 19,
"timed_out": false,
"_shards": {
"total": 19,
"successful": 19,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 4043,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"aggregations": {
"Per_Broker": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "node1",
"doc_count": 1081,
"Last_Value": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 1080,
"buckets": [
{
"key": 1698237171688,
"key_as_string": "2023-10-25T12:32:51.688Z",
"doc_count": 1,
"Active_Controller": {
"value": 0
}
}
]
}
},
{
"key": "node2",
"doc_count": 1026,
"Last_Value": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 1025,
"buckets": [
{
"key": 1698237168398,
"key_as_string": "2023-10-25T12:32:48.398Z",
"doc_count": 1,
"Active_Controller": {
"value": 1
}
}
]
}
},
{
"key": "node3",
"doc_count": 992,
"Last_Value": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 991,
"buckets": [
{
"key": 1698237165024,
"key_as_string": "2023-10-25T12:32:45.024Z",
"doc_count": 1,
"Active_Controller": {
"value": 0
}
}
]
}
},
{
"key": "node4",
"doc_count": 944,
"Last_Value": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 943,
"buckets": [
{
"key": 1698237161746,
"key_as_string": "2023-10-25T12:32:41.746Z",
"doc_count": 1,
"Active_Controller": {
"value": 0
}
}
]
}
}
]
},
"Number_of_Active_Controllers": {
"value": 0
}
}
}
No matter how many times I run it, the Number_of_Active_Controllers
bucket has the value 0, even though at least one Active_Controller
bucket has the value 1 every single time. I would expect this to be 1, since 0 + 1 + 0 + 0 = 1
.
Do I have a problem in my bucket path? This was the only way I could get both a valid path at all, and a metric that was a number, not an object.
Bonus question: how would I translate this into a Kibana Metric or Gauge visualization? I've tried every which way, but I can't find any way to get specifically the last values per node, the way I can with the above query.