Pipeline aggregations: apply Histogram over Terms results


(Michael Kuzmin) #1

Hi

I'm trying to analyze user retention metrics based on access logs.
My index contains documents like these:
{ "timestamp": "2015-11-01", "user_id": "1" }
{ "timestamp": "2015-11-02", "user_id": "1" }
{ "timestamp": "2015-11-02", "user_id": "2" }
{ "timestamp": "2015-11-03", "user_id": "2" }

A query


{ "aggs": { "ids": {
"terms": { "field": "user_id" },
"aggs": { "last_date": { "max": { "field": "timestamp", "format": "date" }}}
}}}

returns last login date for each user:
{ "buckets": [
    { "key": "1", "last_date": { "value_as_string": "2015-11-02"}},
    { "key": "2", "last_date": { "value_as_string": "2015-11-03"}}
]}
Now I need to find out how many unique "keys" I have for each day, using date histogram. I've tried referencing a bucket like here
{ 
  "aggs": { 
    "ids": {
      "terms": { "field": "user_id" },
      "aggs": { "last_date":  { "max": { "field": "timestamp", "format": "date" }}}
    },  
    "dates": {
      "date_histogram": {
        "buckets_path": "ids>last_date",
        "field": "timestamp",
        "interval": "day",
        "format": "date"
      }
    }
  }
}
But apparently `date_histogram` cannot accept `buckets_path` parameter.

Is there a way to apply Date histogram over a bucket? Can both these aggregations be calculated in a single query?


(system) #2