Aggregation on a datetime field

I'm trying to get aggregated information from a datetime field:

  • Get how many documents are there by month (NOT YEAR/MONTH!, ONLY MONTH): So, a 'groupby(month(datetime_field))'.

Thanks for all.

Just so I understand the question right: if you have docs with timestamps 2015-01-25, 2015-02-15, 2016-01-25 you want two buckets, one for january with two docs and one for february with one doc?

Yes, I've already solved that, setting to 1 minDocCount.
I'm yet failing with the first question.

If you want to aggregate on month only (regardless of year) I think you have to extract that month information and index it into a separate field (e.g. numeric from 1-12), then aggregate on that field.

Ooouuu! Isn't here any more way?: using an script...?

I'm fairly sure you can use a script, yes. I like to start exploring scripts by making script fields for things like doc['field'].value.class so I can lookup the javadoc of the thing. I'm fairly sure that times come to you as joda's Instant class. So I'm guessing you could do doc['field'].value.toDateTime().monthOfYear().getAsText(). I expect that'd do the conversion in UTC but I'm not sure.

@nik9000 thats great, I played around with it but unfortunately datetime fields are stored as longs, but you can still use Joda (at least with groovy scripts). I got the following working, might be what @living_jordi is looking for:

PUT /sales/t/1
{
  "date" : "2015-01-01"
}

PUT /sales/t/2
{
  "date" : "2015-02-01"
}

PUT /sales/t/3
{
  "date" : "2016-01-01"
}

PUT /sales/t/4
{
  "date" : "2011-01-01"
}

GET /sales/t/_search
{
  "aggs": {
    "group_by_month": {
      "terms": {
        "script": "def opDate = new DateTime(doc['date'].date); opDate.getMonthOfYear()",
        "order": {
          "_term": "asc"
        }
      }
    }
  }
}

==>

  "aggregations": {
    "group_by_month": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "1",
          "doc_count": 3
        },
        {
          "key": "2",
          "doc_count": 1
        }
      ]
    }
  }

In terms of performance, storing the month as a separate field during indexing will still be faster though (provided you don't already have all your docs indexed and can't update them), no?

Thanks for that! There are lots of things I wasn't thinking of that you solved there.

I dunno if it'd be worth restoring everything with the months pulled out. You'd make the index larger and reduce disk locality a bit to save running the script and the allocations it causes. If this isn't something that you do all the time I'd be happy with the script.

Now if you had to search by month, that is a different story!