Date histogram and derivative on doc_count


#1

Let say I have many stores and I want to show the stores which had the most growth in number of visits between January and February.

So far I’m using date_histogram to get the numbers per month and per store with this query :

query: {
  range: {
    visited_at: {
      gte: '2016-01-01T00:00:00Z',
      lt: '2016-03-01T00:00:00Z'
    }
  }
},
size: 0,
aggs: {
  months: {
    date_histogram: {
      field: 'visited_at',
      interval: 'month'
    },
    aggs: {
      stores: {
        terms: {
          size: 0,
          field: 'store_id'
        }
      }
    }
  }
}

And it give me a result like this:

"aggregations": {
    "months": {
      "buckets": [
        {
          "key_as_string": "2016-01-01T00:00:00.000+00:00",
          "key": 1451574000000,
          "doc_count": 300,
          "stores": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": 1,
                "doc_count": 100
              },
              {
                "key": 2,
                "doc_count": 200
              }
            ]
          }
        },
        {
          "key_as_string": "2016-02-01T00:00:00.000+00:00",
          "key": 1454252400000,
          "doc_count": 550,
          "stores": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": 1,
                "doc_count": 150
              },
              {
                "key": 2,
                "doc_count": 400
              }
            ]
          }
        }
      ]
    }
  }

With this I’m fetching the data for all the stores and then comparing the growth in my code but I’m hoping there is a query that would let Elasticsearch calculate the growth and return me only the top n.

I tried some Pipeline aggregations but I couldn’t manage to get what I wanted because I can' manage to use the doc_count value returned and use it for a derivative aggregation.
Any idea how I can do that?

I guess another way to improve that would be to have a batch compute the monthly growth at the end of each month and then store it. Does Elasticsearch has something that could do this automatically?

FYI I'm on Elasticseach 2.2 and I'm using this for the growth: (feb_result - jan_result) / jan_result

Thx,
J


(system) #2