Kibana metric help OR ( Bucket selector on a top metrics aggregation AND OR VEGA debug )

Hello everyone,

I stumbled across an issue I cannot resolve and I have no idea how to continue from here.

My initial situation is simple :
I have logs of 'problems' every minute in this format : ID|Status.
Open logs continue to send log every minute to say they are still open.
When a problem is 'closed' it sends a last log with the status close and stop sending logs.

So for example :
(now - 3 minutes) I have 3 document : id1|open / id2|open / id3|open

(now - 2 minutes) I have 6 document : id1|open / id2|open / id3|open /// id1|close / id2|open / id3|open

(now - 1 minute) I have 8 document : id1|open / id2|open / id3|open /// id1|close / id2|open / id3|open /// id2|open / id3|close

(now) I have 9 document : id1|open / id2|open / id3|open /// id1|close / id2|open / id3|open /// id2|open / id3|close /// id2|open

What I want :

  1. A table that shows only OPEN problems
  2. A metric that shows the number of OPEN problems

I want to gather at (now) the number of open problems ( Problems that have never been closed )
So at n-3 I have 3 open problems / n-2 I have 2 open problems / n-1 I have 1 open problem / n I have 1 open problem

What I tried :

  1. I made a table that aggregate by ID, then that gather the last value ( on timestamp ) of the status. It will show all problems and even the ones that are closed, but it's ok since I can say that my table is (now - 1min) that way ( since closed problems don't send new logs ) I would only have the open problems.
    It kinda works but it is still not ideal since sometimes some open problems may still show for a minute

  2. I tried TSVB, agg base metric, Lens Metric, runtime fields.
    None of them gave me what I wanted but maybe I just failed to deliver something that works and someone here can help me.

What kinda worked :

Vega) I tried Vega as a last resort and I feel I can almost touch a solution ( even though I would love not to use Vega and find a way with the other types of metric )

Here is a request that gather all the problems and show the status

GET myindex/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "eventType.keyword": "problems"
          }
        }
      ]
    }
  },
  "aggs": {
    "open_problems": {
      "terms": {
        "field": "problemId.keyword",
        "size": 10000
      },
      "aggs": {
        "latest_status": {
          "top_metrics": {
            "metrics": {
              "field": "problemStatus.keyword"
            },
            "size": 1,
            "sort": {
              "@timestamp": "desc"
            }
          }
        },
        "filtered_buckets": {
          "bucket_selector": {
            "buckets_path": {
              "latestStatus": "latest_status"
            },
            "script": "params.latestStatus != 'close'"
          }
        }
      }
    }
  }
}

And I couldn't make bucket_selector to work even though : Filtering after a top_hits have been applied or bucket selector on buckets, which can select based on buckets which has top_hits applied · Issue #94967 · elastic/elasticsearch · GitHub

I feel like the .keyword is breaking everything ?

So I tried to use the request without the bucket_selector
It gives :

{
  "took": 801,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "problems": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "id1",
          "doc_count": 17558,
          "latest_status": {
            "top": [
              {
                "sort": [
                  "2023-06-26T07:24:00.002Z"
                ],
                "metrics": {
                  "problemStatus.keyword": "open"
                }
              }
            ]
          }
        },
        {
          "key": "id2",
          "doc_count": 17141,
          "latest_status": {
            "top": [
              {
                "sort": [
                  "2023-06-26T09:21:00.028Z"
                ],
                "metrics": {
                  "problemStatus.keyword": "close"
                }
              }
            ]
          }
        }

So I wanted to use VEGA to make it work but I couldn't wrap my head around Vega ....

Vega :


{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "data": [
    {
      "name": "all_problems",
      "url": {
        "index": "myindex",
        "body": {
          "size": 0,
          "query": {
            "bool": {
              "filter": [
                {
                  "term": {
                    "eventType.keyword": "problems"
                  }
                }
              ]
            }
          },
          "aggs": {
            "problems": {
              "terms": {
                "field": "problemId.keyword",
                "size": 10000
              },
              "aggs": {
                "latest_status": {
                  "top_metrics": {
                    "metrics": {
                      "field": "problemStatus.keyword"
                    },
                    "size": 1,
                    "sort": {
                      "@timestamp": "desc"
                    }
                  }
                }
              }
            }
          }
        },
        "format": {"property": "aggregations.problems.buckets"}
      }
    }
  ],
  "marks": [
    {
      "type": "text",
      "encode": {
        "enter": {
          "align": {"value": "center"},
          "baseline": {"value": "middle"},
          "fill": {"value": "black"},
          "fontSize": {"value": 32}
        },
        "update": {
          "x": {"signal": "width / 2"},
          "y": {"signal": "height / 2"},
          "text": {"signal": "datum"}
        }
      }
    }
  ]
}

Can you help me find a solution ( Preferably without VEGA ) for my two problems ?

Thanks a lot in advance :slight_smile:

I don't think you can avoid Vega for this very specific use case. The problem I see in your case is that the bucket selector can't filter on keyword. This discussion was very insightful

So I replicated your scenario with an additional field statusId that uses an integer and without the problems for filtering so the search part later is a simple match_all query without including any date filtering that would be necessary if you want to go back in time.

In the Kibana Dev Console:

# Clean up
DELETE discuss-336899

# Create an index
PUT discuss-336899
{
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date"
      },
      "id": {
        "type": "keyword"
      },
      "status": {
        "type": "keyword"
      },
      "statusId": {
        "type": "integer"
      }
    }
  }
}

# Add data from problem definition
POST discuss-336899/_bulk
{ "index": {}}
{ "timestamp": "2023-07-19T11:50:00+0200", "id": "1", "status": "open", "statusId": 1}
{ "index": {}}
{ "timestamp": "2023-07-19T11:50:00+0200", "id": "2", "status": "open", "statusId": 1}
{ "index": {}}
{ "timestamp": "2023-07-19T11:50:00+0200", "id": "3", "status": "open", "statusId": 1}
{ "index": {}}
{ "timestamp": "2023-07-19T11:52:00+0200", "id": "1", "status": "close", "statusId": 2}
{ "index": {}}
{ "timestamp": "2023-07-19T11:52:00+0200", "id": "2", "status": "open", "statusId": 1}
{ "index": {}}
{ "timestamp": "2023-07-19T11:52:00+0200", "id": "3", "status": "open", "statusId": 1}
{ "index": {}}
{ "timestamp": "2023-07-19T11:54:00+0200", "id": "2", "status": "open", "statusId": 1}
{ "index": {}}
{ "timestamp": "2023-07-19T11:54:00+0200", "id": "3", "status": "close", "statusId": 2}
{ "index": {}}
{ "timestamp": "2023-07-19T11:56:00+0200", "id": "2", "status": "open", "statusId": 1}

So now the query that returns the last open problems would be:


GET discuss-336899/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "open_problems": {
      "terms": {
        "field": "id",
        "size": 10000
      },
      "aggs": {
        "latest_status": {
          "top_metrics": {
            "metrics": [
              {"field": "status"},
              {"field": "statusId"}
            ],
            "size": 1,
            "sort": {
              "timestamp": "desc"
            }
          }
        },
        "only_open": {
          "bucket_selector": {
            "buckets_path": {
              "latestStatus": "latest_status.statusId"
            },
            "script": "params.latestStatus ==1"
          }
        }
      }
    }
  }
}

And the result is correctly showing only the open problem number 2.

{
  ...
  "aggregations": {
    "open_problems": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "2",
          "doc_count": 4,
          "latest_status": {
            "top": [
              {
                "sort": [
                  "2023-07-19T09:56:00.000Z"
                ],
                "metrics": {
                  "status": "open",
                  "statusId": 1
                }
              }
            ]
          }
        }
      ]
    }
  }
}

So as you can notice, the bucket selector is pointing to the statusId field to be used for the scripted filter.

You posted some time ago, so this may get you late, but I hope it can serve for future reference.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.