Kibana metric help OR ( Bucket selector on a top metrics aggregation AND OR VEGA debug )

LucasE · June 26, 2023, 11:25am

Hello everyone,

I stumbled across an issue I cannot resolve and I have no idea how to continue from here.

My initial situation is simple :
I have logs of 'problems' every minute in this format : ID|Status.
Open logs continue to send log every minute to say they are still open.
When a problem is 'closed' it sends a last log with the status close and stop sending logs.

So for example :
(now - 3 minutes) I have 3 document : id1|open / id2|open / id3|open

What I want :

A table that shows only OPEN problems
A metric that shows the number of OPEN problems

I want to gather at (now) the number of open problems ( Problems that have never been closed )
So at n-3 I have 3 open problems / n-2 I have 2 open problems / n-1 I have 1 open problem / n I have 1 open problem

What I tried :

I made a table that aggregate by ID, then that gather the last value ( on timestamp ) of the status. It will show all problems and even the ones that are closed, but it's ok since I can say that my table is (now - 1min) that way ( since closed problems don't send new logs ) I would only have the open problems.
It kinda works but it is still not ideal since sometimes some open problems may still show for a minute
I tried TSVB, agg base metric, Lens Metric, runtime fields.
None of them gave me what I wanted but maybe I just failed to deliver something that works and someone here can help me.

What kinda worked :

Vega) I tried Vega as a last resort and I feel I can almost touch a solution ( even though I would love not to use Vega and find a way with the other types of metric )

Here is a request that gather all the problems and show the status

GET myindex/_search
{
  "size": 0,
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "eventType.keyword": "problems"
          }
        }
      ]
    }
  },
  "aggs": {
    "open_problems": {
      "terms": {
        "field": "problemId.keyword",
        "size": 10000
      },
      "aggs": {
        "latest_status": {
          "top_metrics": {
            "metrics": {
              "field": "problemStatus.keyword"
            },
            "size": 1,
            "sort": {
              "@timestamp": "desc"
            }
          }
        },
        "filtered_buckets": {
          "bucket_selector": {
            "buckets_path": {
              "latestStatus": "latest_status"
            },
            "script": "params.latestStatus != 'close'"
          }
        }
      }
    }
  }
}

And I couldn't make bucket_selector to work even though : Filtering after a top_hits have been applied or bucket selector on buckets, which can select based on buckets which has top_hits applied · Issue #94967 · elastic/elasticsearch · GitHub

I feel like the .keyword is breaking everything ?

So I tried to use the request without the bucket_selector
It gives :

{
  "took": 801,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10000,
      "relation": "gte"
    },
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "problems": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "id1",
          "doc_count": 17558,
          "latest_status": {
            "top": [
              {
                "sort": [
                  "2023-06-26T07:24:00.002Z"
                ],
                "metrics": {
                  "problemStatus.keyword": "open"
                }
              }
            ]
          }
        },
        {
          "key": "id2",
          "doc_count": 17141,
          "latest_status": {
            "top": [
              {
                "sort": [
                  "2023-06-26T09:21:00.028Z"
                ],
                "metrics": {
                  "problemStatus.keyword": "close"
                }
              }
            ]
          }
        }

So I wanted to use VEGA to make it work but I couldn't wrap my head around Vega ....

Vega :


{
  "$schema": "https://vega.github.io/schema/vega/v5.json",
  "data": [
    {
      "name": "all_problems",
      "url": {
        "index": "myindex",
        "body": {
          "size": 0,
          "query": {
            "bool": {
              "filter": [
                {
                  "term": {
                    "eventType.keyword": "problems"
                  }
                }
              ]
            }
          },
          "aggs": {
            "problems": {
              "terms": {
                "field": "problemId.keyword",
                "size": 10000
              },
              "aggs": {
                "latest_status": {
                  "top_metrics": {
                    "metrics": {
                      "field": "problemStatus.keyword"
                    },
                    "size": 1,
                    "sort": {
                      "@timestamp": "desc"
                    }
                  }
                }
              }
            }
          }
        },
        "format": {"property": "aggregations.problems.buckets"}
      }
    }
  ],
  "marks": [
    {
      "type": "text",
      "encode": {
        "enter": {
          "align": {"value": "center"},
          "baseline": {"value": "middle"},
          "fill": {"value": "black"},
          "fontSize": {"value": 32}
        },
        "update": {
          "x": {"signal": "width / 2"},
          "y": {"signal": "height / 2"},
          "text": {"signal": "datum"}
        }
      }
    }
  ]
}

Can you help me find a solution ( Preferably without VEGA ) for my two problems ?

Thanks a lot in advance

jsanz · July 19, 2023, 10:36am

I don't think you can avoid Vega for this very specific use case. The problem I see in your case is that the bucket selector can't filter on keyword. This discussion was very insightful

github.com/elastic/elasticsearch

Filtering after a top_hits have been applied or bucket selector on buckets, which can select based on buckets which has top_hits applied

opened 12:05PM - 01 Apr 23 UTC

closed 01:33PM - 15 May 23 UTC

mat013

>enhancement :Analytics/Aggregations Team:Analytics

### Description I am unsure whether it is a bug (because I could not find anyth…ing in the documentation) or it new feature, and I have tried to ask various forums and googled to see if I could come up with an answer to my problem. I have a use case (which I imagine is not that unique), which is as follows: I am trying to create a query in elasticsearch, which is able to retrieve the latest documents for each group and then filter on some criteria on those documents which has been found. As an example: Say the following documents are indexed in myindex in elasticsearch: > POST /myindex/_bulk { "index":{} } { "objid": 1, "ident":"group1","version":1, "chdate": 1, "field1" : 1} { "index":{} } { "objid": 2, "ident":"group1","version":2, "chdate": 2, "field1" : 0} { "index":{} } { "objid": 3, "ident":"group1","version":2, "chdate": 3, "field1" : 1} { "index":{} } { "objid": 4, "ident":"group1","version":2, "chdate": 4, "field1" : 0} { "index":{} } { "objid": 5, "ident":"group1","version":3, "chdate": 1, "field1" : 0} I would like to find all documents, which has field1 set to x for the document with the highest chdate, for each ident and version, where the selected document has field1 set to x. In a case where x is 0 then the documents, which has objid 4 and 5 should be returned In a case where x is 1 then the documents, which has objid 1 should be returned Initially I tried to do following query ` { "size": 0, "aggs": { "by_ident": { "terms": { "field": "ident.keyword", "size": 10 }, "aggs": { "by_version": { "terms": { "field": "version", "size": 10000 }, "aggs": { "by_latest": { "top_hits": { "sort": [{ "chdate": { "order": "desc" } }], "size": 1 } } } } } } } } ` However it is not possible to apply a filter afterwards to the top hits ChatGPT suggested to use a bucket selector ` { "size": 0, "aggs": { "ident": { "terms": { "field": "ident" }, "aggs": { "version": { "terms": { "field": "version" }, "aggs": { "top_hits_agg": { "top_hits": { "size": 1, "sort": [ { "chdate": { "order": "desc" } } ] } }, "field1_filter": { "bucket_selector": { "buckets_path": { "hits": "top_hits_agg.hits.hits", "field1": "top_hits_agg.hits.hits._source.field1" }, "script": { "source": "params.field1 == 0" } } } } } } } } } ` However it fails with: Validation Failed: 1: No aggregation found for path [top_hits_agg.hits.hits._source.field1] So what I would like to suggest is one of the following: 1. Add a filtering block to top_hits 2. Make bucket_selector able to select a top hits bucket. Personally I think option 2 will be more applicable, however option 1 would maybe more easier to comprehend when somebody has to maintain the query later on.

So I replicated your scenario with an additional field statusId that uses an integer and without the problems for filtering so the search part later is a simple match_all query without including any date filtering that would be necessary if you want to go back in time.

In the Kibana Dev Console:

# Clean up
DELETE discuss-336899

# Create an index
PUT discuss-336899
{
  "mappings": {
    "properties": {
      "timestamp": {
        "type": "date"
      },
      "id": {
        "type": "keyword"
      },
      "status": {
        "type": "keyword"
      },
      "statusId": {
        "type": "integer"
      }
    }
  }
}

# Add data from problem definition
POST discuss-336899/_bulk
{ "index": {}}
{ "timestamp": "2023-07-19T11:50:00+0200", "id": "1", "status": "open", "statusId": 1}
{ "index": {}}
{ "timestamp": "2023-07-19T11:50:00+0200", "id": "2", "status": "open", "statusId": 1}
{ "index": {}}
{ "timestamp": "2023-07-19T11:50:00+0200", "id": "3", "status": "open", "statusId": 1}
{ "index": {}}
{ "timestamp": "2023-07-19T11:52:00+0200", "id": "1", "status": "close", "statusId": 2}
{ "index": {}}
{ "timestamp": "2023-07-19T11:52:00+0200", "id": "2", "status": "open", "statusId": 1}
{ "index": {}}
{ "timestamp": "2023-07-19T11:52:00+0200", "id": "3", "status": "open", "statusId": 1}
{ "index": {}}
{ "timestamp": "2023-07-19T11:54:00+0200", "id": "2", "status": "open", "statusId": 1}
{ "index": {}}
{ "timestamp": "2023-07-19T11:54:00+0200", "id": "3", "status": "close", "statusId": 2}
{ "index": {}}
{ "timestamp": "2023-07-19T11:56:00+0200", "id": "2", "status": "open", "statusId": 1}

So now the query that returns the last open problems would be:


GET discuss-336899/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "open_problems": {
      "terms": {
        "field": "id",
        "size": 10000
      },
      "aggs": {
        "latest_status": {
          "top_metrics": {
            "metrics": [
              {"field": "status"},
              {"field": "statusId"}
            ],
            "size": 1,
            "sort": {
              "timestamp": "desc"
            }
          }
        },
        "only_open": {
          "bucket_selector": {
            "buckets_path": {
              "latestStatus": "latest_status.statusId"
            },
            "script": "params.latestStatus ==1"
          }
        }
      }
    }
  }
}

And the result is correctly showing only the open problem number 2.

{
  ...
  "aggregations": {
    "open_problems": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "2",
          "doc_count": 4,
          "latest_status": {
            "top": [
              {
                "sort": [
                  "2023-07-19T09:56:00.000Z"
                ],
                "metrics": {
                  "status": "open",
                  "statusId": 1
                }
              }
            ]
          }
        }
      ]
    }
  }
}

So as you can notice, the bucket selector is pointing to the statusId field to be used for the scripted filter.

You posted some time ago, so this may get you late, but I hope it can serve for future reference.

system · August 16, 2023, 10:37am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best practice to aggregate by issue status Kibana lens	5	185	February 6, 2024
Kibana visualization built with context of other logs Kibana	4	374	May 19, 2021
Daily dashboard with metrics from the last document Kibana vega	3	309	September 7, 2023
Count the latest records in Kibana visualization? Kibana	7	2038	January 4, 2023
Kibana visualization built with context of other logs Elasticsearch	5	372	May 11, 2021

Kibana metric help OR ( Bucket selector on a top metrics aggregation AND OR VEGA debug )

Related topics