Average value of buckets in wide range

Hello,
I'm trying to solve problem regarding average value of buckets with small interval in wide range.

Example:
I have 10 services, each send messages every 5 sec with it's state: "ACTIVATE", "STOPPED" etc.

I would like to show state in time range, here is example in second interval for @timestamp:
image

When i set range to more than 15 minutes there problem is seen:
image

Now we could see there is more than 10 services, but i using only 10. I think it should show for example average value from lower timestamp, do you know how i could do that?

The second graph goes above 10 on the y-axis because some services are getting counted twice if their state changes within a 30 second interval. The unique count aggregation works per bucket, and on the x-axis you're bucketing by time and status code.

For each 30 second interval, what are you actually trying to see?

Hi,
thanks for good understanding of my problem.

I want to try for example: if service will change state within 30 secs interval i want to get average value of that state. for example by 30 secs if we get 5 failed and 1 active state i want to take it as failed because average of that was higher.

Other example:
first service: 4 failed, 1 active, 1 initializing -> FAILED
second service: 5 active, 1 failed -> ACTIVE
and on the chart we will see 1 with failed and 1 with active on 30 secs @timestamp instead 2 active, 2 failed and 1 initializing.

Thanks for the extra explanation.

I'm sorry to say I can't think of a way to do this. I asked some of the folks who work on the Visualize app in Kibana and they're also stumped. It's likely not possible to do this in Kibana at the moment. I can't think of a way to do this with raw Elasticsearch queries either, so I'm going to move this question to the ES forum to see if anyone there has an idea. If they can come up with a way to do it with ES, then we can create a Github ticket for exposing that functionality in Kibana.

If the criteria is to always take the status with the greatest count, a terms agg on the state field, ordered by doc_count descending and size 1 will work. Basically, it aggregates together all the states at time=t, partitioned by each user, and then only retains the state with the greatest doc count... give you the "averaged" state for that user at that point in time.

This doesn't handle more complex situations like what happens if there is a tie. For that you'd probably have to set the size to >1 and do some client-side processing to figure out which value to use.

Here's an example:

PUT /test/
{
  "mappings": {
    "_doc": {
      "properties": {
        "state": {
          "type": "keyword"
        }
      }
    }
  }
}

POST /test/_doc/_bulk
{ "index" : {} }
{ "user": 1, "timestamp": 1, "state": "started"  }
{ "index" : {} }
{ "user": 1, "timestamp": 1, "state": "started"  }
{ "index" : {} }
{ "user": 1, "timestamp": 1, "state": "stopped"  }
{ "index" : {} }
{ "user": 2, "timestamp": 1, "state": "started"  }
{ "index" : {} }
{ "user": 2, "timestamp": 1, "state": "stopped"  }
{ "index" : {} }
{ "user": 2, "timestamp": 1, "state": "stopped"  }

GET /test/_search
{
  "size": 0,
  "aggs": {
    "user": {
      "terms": {
        "field": "user"
      },
      "aggs": {
        "histo": {
          "date_histogram": {
            "field": "timestamp",
            "interval": 1
          },
          "aggs": {
            "states": {
              "terms": {
                "field": "state",
                "order": {
                  "_count": "desc"
                },
                "size": 1
              }
            }
          }
        }
      }
    }
  }
}

@polyfractal our problem then is that there is a bucket for each user so when Kibana goes to visualize this, the bars in the chart will be split up into 10 sections (one for each user) instead of 3 (one for each state). Is there any way (perhaps with pipeline aggs?) to take the results of the terms agg you suggested and get the total count for each state, so that we can create a graph like the second one in the original post?

Ah, I see. Not sure, that might not be possible... pipeline aggs don't work too well with terms aggs. I'll poke at it tomorrow and see if there's a way

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.