Average value of buckets in wide range

slawek367 · May 4, 2018, 9:16am

Hello,
I'm trying to solve problem regarding average value of buckets with small interval in wide range.

Example:
I have 10 services, each send messages every 5 sec with it's state: "ACTIVATE", "STOPPED" etc.

I would like to show state in time range, here is example in second interval for @timestamp:

When i set range to more than 15 minutes there problem is seen:

Now we could see there is more than 10 services, but i using only 10. I think it should show for example average value from lower timestamp, do you know how i could do that?

Bargs · May 4, 2018, 7:29pm

The second graph goes above 10 on the y-axis because some services are getting counted twice if their state changes within a 30 second interval. The unique count aggregation works per bucket, and on the x-axis you're bucketing by time and status code.

For each 30 second interval, what are you actually trying to see?

slawek367 · May 5, 2018, 12:59am

Hi,
thanks for good understanding of my problem.

I want to try for example: if service will change state within 30 secs interval i want to get average value of that state. for example by 30 secs if we get 5 failed and 1 active state i want to take it as failed because average of that was higher.

Other example:
first service: 4 failed, 1 active, 1 initializing -> FAILED
second service: 5 active, 1 failed -> ACTIVE
and on the chart we will see 1 with failed and 1 with active on 30 secs @timestamp instead 2 active, 2 failed and 1 initializing.

Bargs · May 8, 2018, 9:45pm

Thanks for the extra explanation.

I'm sorry to say I can't think of a way to do this. I asked some of the folks who work on the Visualize app in Kibana and they're also stumped. It's likely not possible to do this in Kibana at the moment. I can't think of a way to do this with raw Elasticsearch queries either, so I'm going to move this question to the ES forum to see if anyone there has an idea. If they can come up with a way to do it with ES, then we can create a Github ticket for exposing that functionality in Kibana.

polyfractal · May 10, 2018, 5:15pm

If the criteria is to always take the status with the greatest count, a terms agg on the state field, ordered by doc_count descending and size 1 will work. Basically, it aggregates together all the states at time=t, partitioned by each user, and then only retains the state with the greatest doc count... give you the "averaged" state for that user at that point in time.

This doesn't handle more complex situations like what happens if there is a tie. For that you'd probably have to set the size to >1 and do some client-side processing to figure out which value to use.

Here's an example:

PUT /test/
{
  "mappings": {
    "_doc": {
      "properties": {
        "state": {
          "type": "keyword"
        }
      }
    }
  }
}

POST /test/_doc/_bulk
{ "index" : {} }
{ "user": 1, "timestamp": 1, "state": "started"  }
{ "index" : {} }
{ "user": 1, "timestamp": 1, "state": "started"  }
{ "index" : {} }
{ "user": 1, "timestamp": 1, "state": "stopped"  }
{ "index" : {} }
{ "user": 2, "timestamp": 1, "state": "started"  }
{ "index" : {} }
{ "user": 2, "timestamp": 1, "state": "stopped"  }
{ "index" : {} }
{ "user": 2, "timestamp": 1, "state": "stopped"  }

GET /test/_search
{
  "size": 0,
  "aggs": {
    "user": {
      "terms": {
        "field": "user"
      },
      "aggs": {
        "histo": {
          "date_histogram": {
            "field": "timestamp",
            "interval": 1
          },
          "aggs": {
            "states": {
              "terms": {
                "field": "state",
                "order": {
                  "_count": "desc"
                },
                "size": 1
              }
            }
          }
        }
      }
    }
  }
}

Bargs · May 10, 2018, 9:16pm

@polyfractal our problem then is that there is a bucket for each user so when Kibana goes to visualize this, the bars in the chart will be split up into 10 sections (one for each user) instead of 3 (one for each state). Is there any way (perhaps with pipeline aggs?) to take the results of the terms agg you suggested and get the total count for each state, so that we can create a graph like the second one in the original post?

polyfractal · May 14, 2018, 9:33pm

Ah, I see. Not sure, that might not be possible... pipeline aggs don't work too well with terms aggs. I'll poke at it tomorrow and see if there's a way

system · June 11, 2018, 9:33pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Fun problem: Average of the bucketed count for time of the day in time range Kibana	1	468	July 24, 2019
A question about how to show the average rate per second Kibana	6	24868	September 5, 2017
Kibana AVG last hour Kibana	4	821	March 16, 2018
Need to add the values for every second and display it in a bar chart Kibana	5	1114	August 9, 2017
How to retrieve the interval value in an Aggregation script Elasticsearch	2	422	July 5, 2017

Average value of buckets in wide range

Related topics