How to get DAU / WAU / ... charts in ElasticSearch?


(Eugene Glotov) #1

Hello.

Sorry for my strange English.

I'm trying to use pipeline aggregations (still deep in the manual) with a goal to get a line chart with a complex calculation for each day on the calendar.

In common, I'm looking a way to get (L)DAU / (L)WAU / (L)MAU charts (examples, SQL-syntax example with Google BigQuery). And then more complex charts (like ARPU, which need to divide one line daily measures to the another line measures) that used the data from WAU etc.

I wrote a query that can show the data for Daily Active Users (DAU) chart:

DAU chart data query
GET sessions-*/_search
{
  "size": 0,
  "aggs": {
    "dau": {
      "filter": {
        "bool": {
          "must": [
            {...},
            {
              "range": {
                "@timestamp": {
                  "lte": "now",
                  "gte": "now-30d/d"
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "interval_aggregation": {
          "date_histogram": {
            "field": "@timestamp",
            "interval": "1d"
          },
          "aggs": {
            "distinct_visitors": {
              "cardinality": {
                "field": "user_id"
              }
            }
          }
        }
      }
    }
  }
}
DAU query response
"aggregations": {
    "wau": {
      "doc_count": 1991,
      "interval_aggregation": {
        "buckets": [
          {
             ...
            "distinct_visitors": {
              "value": 90
            }
          },
          {
             ...
            "distinct_visitors": {
              "value": 103
            }
          },
        ]
      }
    }
}

But for Weekly Active Users (WAU) this query is not enough. I can set interval as "7d" and it will return a good value, but I need to see this measure for every calendar day.

So, for today I can get one WAU value with this request:

WAU query just for today (on day start)
GET sessions-*/_search
{
  "size": 0,
  "aggs": {
    "wau": {
      "filter": {
        "bool": {
          "must": [
            {...},
            {
              "range": {
                "@timestamp": {
                  "lte": "now/d",
                  "gte": "now-7d/d"
                }
              }
            }
          ]
        }
      },
      "aggs": {
        "distinct_visitors": {
          "cardinality": {
            "field": "user_id"
          }
        }
      }
    }
  }
}
WAU one day query response
"aggregations": {
    "wau": {
      "doc_count": 3492,
      "distinct_visitors": {
        "value": 18
      }
    }
}

How I can get the WAU value for each calendar day?


How to get ES aggregation data as Logstash input?
(Eugene Glotov) #2

I thinking about this variants:

  1. I need to replace this part with something like variables.

In MySQL it can be something like this: (in reality can't because of the problem)
SELECT @day := day as day, wau.visitors
(SELECT visitors FROM ...impossible wau query... WHERE date=@day) as wau FROM (SELECT day FROM ...crazy query for calendar range... )

In ElasticSearch maybe it is possible with scripting? But how?

... WAU aggregation inside scripted loop through a calendar executed for each day:
 "lte": "{{ CALENDAR_DAY_VARIABLE }}",
 "gte": "{{ CALENDAR_DAY_VARIABLE }}-6d/d"
  1. I need to use pipeline aggregations. It is should be something like a "LEFT JOIN".
  • On the left side – a calendar – date histogram, but without using any index data.
  • On the right side – a WAU aggregation that should be called for each day. Some variable is needed again here.
  1. I need to use pipeline aggregations again. In the nested aggregation level I need to make a calendar and get any dates from them. In the top aggregations level I need to get WAU aggregation for any date from nested level, using nested.day instead of "now" in "lte", "gte" and/or "interval" params.

Any ideas about this? Maybe someone already makes this charts with ES?

P.S.: I thinking about cardinally different variant then I can call the simple DAU / WAU / ... queries for every day on application level and cache the result inside a different ES index.

It should be much more faster of course and then I'll can use this index inside exists Kibana visualisations (especially inside Timelion).


(Hain Wang) #3

I come to the same problem, track the DAU/WAU data, have you got your script work, can you share it?


(Eugene Glotov) #4

Unfortunately, I can't share the php script because of commercial NDA. In short, inside I have a cycle that requesting each period from ES and sends the result to output. For example, load every N days, where N is console argument. Each result is separate Json document, so you can index this and then compare with other documents or place them all to different charts, such as DAU. And this tool can ask stats for one day and for 7 last days (for future WAU) and for a 28 days. So, if you will call the utility once, you will have 3 documents. If you call the tool every day (or will add some additional args such as 'offset'), you will have stat for each day that you need. Then just save all of new documents to separate ES index and place on charts.
See the logstash config example here.


(system) #5