Aggregation on a datetime field

living_jordi · August 3, 2016, 11:44am

I'm trying to get aggregated information from a datetime field:

Get how many documents are there by month (NOT YEAR/MONTH!, ONLY MONTH): So, a 'groupby(month(datetime_field))'.

Thanks for all.

cbuescher · August 3, 2016, 11:56am

Just so I understand the question right: if you have docs with timestamps 2015-01-25, 2015-02-15, 2016-01-25 you want two buckets, one for january with two docs and one for february with one doc?

living_jordi · August 3, 2016, 11:58am

Yes, I've already solved that, setting to 1 minDocCount.
I'm yet failing with the first question.

cbuescher · August 3, 2016, 12:04pm

If you want to aggregate on month only (regardless of year) I think you have to extract that month information and index it into a separate field (e.g. numeric from 1-12), then aggregate on that field.

living_jordi · August 3, 2016, 12:09pm

Ooouuu! Isn't here any more way?: using an script...?

nik9000 · August 3, 2016, 12:19pm

I'm fairly sure you can use a script, yes. I like to start exploring scripts by making script fields for things like doc['field'].value.class so I can lookup the javadoc of the thing. I'm fairly sure that times come to you as joda's Instant class. So I'm guessing you could do doc['field'].value.toDateTime().monthOfYear().getAsText(). I expect that'd do the conversion in UTC but I'm not sure.

cbuescher · August 3, 2016, 12:50pm

@nik9000 thats great, I played around with it but unfortunately datetime fields are stored as longs, but you can still use Joda (at least with groovy scripts). I got the following working, might be what @living_jordi is looking for:

PUT /sales/t/1
{
  "date" : "2015-01-01"
}

PUT /sales/t/2
{
  "date" : "2015-02-01"
}

PUT /sales/t/3
{
  "date" : "2016-01-01"
}

PUT /sales/t/4
{
  "date" : "2011-01-01"
}

GET /sales/t/_search
{
  "aggs": {
    "group_by_month": {
      "terms": {
        "script": "def opDate = new DateTime(doc['date'].date); opDate.getMonthOfYear()",
        "order": {
          "_term": "asc"
        }
      }
    }
  }
}

==>

  "aggregations": {
    "group_by_month": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "1",
          "doc_count": 3
        },
        {
          "key": "2",
          "doc_count": 1
        }
      ]
    }
  }

In terms of performance, storing the month as a separate field during indexing will still be faster though (provided you don't already have all your docs indexed and can't update them), no?

nik9000 · August 3, 2016, 1:08pm

Thanks for that! There are lots of things I wasn't thinking of that you solved there.

I dunno if it'd be worth restoring everything with the months pulled out. You'd make the index larger and reduce disk locality a bit to save running the script and the allocations it causes. If this isn't something that you do all the time I'd be happy with the script.

Now if you had to search by month, that is a different story!

Topic		Replies	Views
Aggregations on date fields Elasticsearch	2	1365	February 28, 2017
Aggregation: Difference of Two Date Fields to Aggregate Timedeltas Elasticsearch	1	1962	May 30, 2017
Aggregation using part of a timestamp field Kibana	5	553	July 21, 2022
Get number of days in a month to aggregate data Elasticsearch	7	1404	December 4, 2019
“Day Of Month” aggregation Elasticsearch	3	1824	July 5, 2017

Aggregation on a datetime field

Related topics