How to make histogram by grouped aggregation count?

Artur_Zhdanov · March 20, 2018, 8:56am

Hello,
I have some events, which user did.
/events:
{"type":"register", "user.id":"1", "event.id": "1"}
{"type":"register", "user.id":"1", "event.id": "2"}
{"type":"register", "user.id":"1", "event.id": "3"}
{"type":"register", "user.id":"2", "event.id": "1"}
{"type":"register", "user.id":"2", "event.id": "4"}
{"type":"register", "user.id":"2", "event.id": "5"}
{"type":"register", "user.id":"3", "event.id": "1"}
{"type":"register", "user.id":"3", "event.id": "3"}

I want to visualise count of users by events count in date range.

So, I do aggregation by user.id and have events count:

user #1 register to event 3 times
user #2 register to event 3 times
user #3 register to event 2 times

However, the next step is group by events count. I don't understand how to make it in Kibana.
Please, help me.

timroes · March 20, 2018, 9:43am

Hey, that is unfortunately not possible. If I understand you correctly, you want to group on the result of the aggregation, which Kibana can't do for you.

Could you perhaps describe your use-case a bit more, what should the chart present, that you are trying to achieve? Maybe there is another solution to the problem.

Cheers,
Tim

Artur_Zhdanov · March 20, 2018, 9:55am

Thanks.

Sure, this is a cohort analysis.
I want to know the retention of users and see how often and how many users visit my events per specific time interval.

timroes · March 20, 2018, 9:58am

I think in that case I would switch the aggregations a bit.

You could do a terms aggregation on the event id, so you get one bar chart per event. If you now are interested in how many users visited those events per specific time interval, instead of drawing the count of documents on the y-axis, which would be the overall visits (of all users) for that event, switch the metrics aggregation to "Unique Count" of the user id field. That way you will get only the number of unique user ids, that caused that event.

And the time interval can of course be specified using the time picker on top, but you could also add further limiting time filters to the visualization via the filter bar on top.

Hope that visualization is the direction you need.

Cheers,
Tim

Artur_Zhdanov · March 20, 2018, 10:18am

I thinks this is not correct, because each bar will be contain count of user in an event, but I want to see how many users visited 10 events, how many users visited 9 events, etc...

Artur_Zhdanov · March 20, 2018, 10:46am

I need something like this SQL query:

SELECT
  t.events_count AS visited_events_count,
  count(t.events_count) AS users_count
FROM (
       SELECT
         user_id,
         count(*) AS events_count
       FROM event_user
       WHERE created_at BETWEEN '2018-03-18 00:00:00' AND '2018-03-20 00:00:00'
       GROUP BY user_id
       ORDER BY count(*) DESC
     ) AS t
GROUP BY t.events_count

visited_events_count	users_count
1	193
2	82
3	38
4	15
5	4
6	3
7	2

timroes · March 20, 2018, 10:47am

Yeah, that's true. Sorry I misunderstood your goal there.

So drawing the count of events a user had on the x-axis and the amount of users that hit that many events on the y-axis, is unfortunately not possible, since you cannot achieve that within a single Elasticsearch query.

Artur_Zhdanov · March 20, 2018, 10:49am

Ok, thanks.

timroes · March 20, 2018, 10:55am

You SQL example shows this very well, that you need multiple requests to achieve the desired result. Unfortunately Kibana is currently mostly build upon the concept of a single request per visualization using aggregations. Nevertheless you could achieve the result via Elasticseach, but not with classical Kibana visualizations. You could do a request with a terms aggregation on the user id and a unique count on the event id. That way you would get a list of buckets one for each user with the value of how many events s/he visited. Now you would just need to group by the event count in your system.

If you run Kibana 6.2+ you might be able to achieve that visualization from within Kibana using Vega visualization, which is a way more advanced visualization grammar allowing for customized requests and some post processing.

I will try to figure out a way to visualize that chart via Vega - but classical visualizations for sure won't work right now.

timroes · March 20, 2018, 11:40am

I was able to achieve the desired Graph via Vega visualizations available from 6.2+ in Kibana core as an experimental visualization and you can use the plugin if you are running an older version.

The following Vega lite specification draws your desired bar chart:

{
  $schema: https://vega.github.io/schema/vega-lite/v2.json
  data: {
    url: {
      %timefield%: @timestamp
      %context%: true
      index: /events-* [or your index pattern]
      body: {
        size: 0
        aggs: {
          users: {
            terms: {
              field: "user.id"
              size: 10000
              order: {eventCount: "desc"}
            }
            aggs: {
              eventCount: {
                cardinality: {field: "event.id"}
              }
            }
          }
        }
      }
    }
    format: {property: "aggregations.users.buckets"}
  }
  mark: bar
  transform: [
    {
      aggregate: [
        {op: "count", field: "key", as: "usercount"}
      ]
      groupby: ["eventCount.value"]
    }
  ]
  encoding: {
    x: { field: "eventCount\\.value", type: "ordinal", sort: "descending" }
    y: { field: "usercount", type: "quantitative" }
  }
}

If you make sure the field names are correct, that will do a terms aggregation on the user.id field and calculate the amount of unique event.ids each user had.

After that we use Vega's transform to aggregate all buckets by their eventCount.value (i.e. the value of the unique count of events), so the result will look like:

[
  { "eventCount.value": 5, usercount: 1 },
  { "eventCount.value": 4, usercount: 3 },
  { "eventCount.value": 3, usercount: 20 }, // ...
]

We'll then just use Vega lite encodings to draw this as a bar chart (or you could of course use any other chart you want). The period in eventCount.value must be escaped in the encoding, since Vega will otherwise try to find a nested field eventCount: { value: ... }, but in our case it's just the name of the field containing a period, caused by the groupby aggregation.

Hope that chart is closer to what you are looking for.

Cheers,
Tim

Artur_Zhdanov · March 20, 2018, 12:11pm

Thank you so much. It works perfectly!

timroes · March 20, 2018, 12:13pm

Great that it could help. For any additional styling, the official Vega lite docs are usually quite helpful and also have a look at Yuri's blog post about Vega in Kibana for a general introduction.

system · April 17, 2018, 12:13pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregation + histogram in kibana? Kibana	2	482	June 16, 2017
Bar Chart with two aggregations & filter Kibana	2	383	October 28, 2019
Vega Integration in Kibana Dashboard Kibana	2	320	August 10, 2020
How to make a table report by grouped aggregation count? Kibana	4	10085	April 18, 2018
Histogram (bars) with counts as bins Kibana	9	7185	October 23, 2018

How to make histogram by grouped aggregation count?

Related topics