How does the date_histogram aggregation choose its buckets? Is this tunable?

Michael_Herold · October 16, 2014, 4:03pm

I'm trying to use elasticsearch to give me 30-day statistics for a given
collection of models (pertinent fields are a date in created_at and an
integer in value). Currently, I have this query/aggregation:

{
"query": {
"match_all": {}
},
"aggregations": {
"date_histogram": {
"field": "created_at",
"interval": "30d",
"min_doc_count": 0,
"extended_bounds": {
"min": 1381881600000, // Dynamically generated for 365 days ago
(This is 2013-10-16 00:00:00 +0000)
"max": 1413503999000 // Dynamically generated for end of today
(This is 2014-10-16 23:59:59 +0000)
}
},
"aggregations": {
"stats": {
"extended_stats": {
"field": "value"
}
}
}
}
}

It's working as expected, except for one thing: the buckets don't line up
as expected. For some reason, the last bucket always starts on 2014-10-07
00:00:00 +0000, regardless of what data is in elasticsearch. I have tried
this aggregation on a bunch of different date ranges, including:

1 model instance per day for the past 30 days
1 model instance per day for the past 365 days
1 model instance total, for a created_at of 2014-09-30
1 model instance total, for a created_at of 2014-10-15
1 model instance total, for a created_at of 2014-10-16
1 model instance total, for a created_at of 2014-10-31

I have also tried to adjust the extended bounds, which doesn't shift the
bucket dates at all.

The result is that the last bucket is always giving a date of 2014-10-07.
This throws off the statistics because the last bucket isn't a full 30 days
of material, whereas the rest of buckets are.

My questions:

*- Why are the buckets always pivoting around October 7th? *My expectation
is that it pivots around 30 days prior to extend_bounds["max"].
- Is there a way to tune this?

Thank you in advance for any help you can give.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/87fe5659-50c5-4870-8139-12a680b94c9e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · October 16, 2014, 11:56pm

Hi Michael,

Histogram aggregations return buckets that are a multiple of the interval,
you are getting this weird offset because not all months have exactly 30
days. Setting "interval" to "month" should fix the issue?

On Thu, Oct 16, 2014 at 6:03 PM, Michael Herold michael.j.herold@gmail.com
wrote:

I'm trying to use elasticsearch to give me 30-day statistics for a given
collection of models (pertinent fields are a date in created_at and an
integer in value). Currently, I have this query/aggregation:

{
"query": {
"match_all": {}
},
"aggregations": {
"date_histogram": {
"field": "created_at",
"interval": "30d",
"min_doc_count": 0,
"extended_bounds": {
"min": 1381881600000, // Dynamically generated for 365 days ago
(This is 2013-10-16 00:00:00 +0000)
"max": 1413503999000 // Dynamically generated for end of today
(This is 2014-10-16 23:59:59 +0000)
}
},
"aggregations": {
"stats": {
"extended_stats": {
"field": "value"
}
}
}
}
}

It's working as expected, except for one thing: the buckets don't line up
as expected. For some reason, the last bucket always starts on
2014-10-07 00:00:00 +0000, regardless of what data is in elasticsearch.
I have tried this aggregation on a bunch of different date ranges,
including:

1 model instance per day for the past 30 days

1 model instance per day for the past 365 days

1 model instance total, for a created_at of 2014-09-30

1 model instance total, for a created_at of 2014-10-15

1 model instance total, for a created_at of 2014-10-16

1 model instance total, for a created_at of 2014-10-31

I have also tried to adjust the extended bounds, which doesn't shift
the bucket dates at all.

The result is that the last bucket is always giving a date of 2014-10-07.
This throws off the statistics because the last bucket isn't a full 30 days
of material, whereas the rest of buckets are.

My questions:

*- Why are the buckets always pivoting around October 7th? *My
expectation is that it pivots around 30 days prior to
extend_bounds["max"].
- Is there a way to tune this?

Thank you in advance for any help you can give.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/87fe5659-50c5-4870-8139-12a680b94c9e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/87fe5659-50c5-4870-8139-12a680b94c9e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j5STokXP_R_DMKbFeijxC-vsc6Y2-AOrZGHbon7AwLqFg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Michael_Herold · October 17, 2014, 12:20am

Hi Adrien,

Thank you for the reply. I actually want 30 day buckets, not one month
buckets, for the calculation I'm doing. I would understand the weird offset
if I was using months as a unit since they are of variable length. However,
a day is always 1000 * 60 * 60 * 24 milliseconds, so why would that cause
an offset that is the 7th of the month?

Thank you,
Michael

On Thursday, October 16, 2014 6:56:39 PM UTC-5, Adrien Grand wrote:

Hi Michael,

Histogram aggregations return buckets that are a multiple of the interval,
you are getting this weird offset because not all months have exactly 30
days. Setting "interval" to "month" should fix the issue?

On Thu, Oct 16, 2014 at 6:03 PM, Michael Herold <michael....@gmail.com
<javascript:>> wrote:

I'm trying to use elasticsearch to give me 30-day statistics for a given
collection of models (pertinent fields are a date in created_at and an
integer in value). Currently, I have this query/aggregation:

{
"query": {
"match_all": {}
},
"aggregations": {
"date_histogram": {
"field": "created_at",
"interval": "30d",
"min_doc_count": 0,
"extended_bounds": {
"min": 1381881600000, // Dynamically generated for 365 days ago
(This is 2013-10-16 00:00:00 +0000)
"max": 1413503999000 // Dynamically generated for end of today
(This is 2014-10-16 23:59:59 +0000)
}
},
"aggregations": {
"stats": {
"extended_stats": {
"field": "value"
}
}
}
}
}

It's working as expected, except for one thing: the buckets don't line up
as expected. For some reason, the last bucket always starts on
2014-10-07 00:00:00 +0000, regardless of what data is in elasticsearch.
I have tried this aggregation on a bunch of different date ranges,
including:

1 model instance per day for the past 30 days

1 model instance per day for the past 365 days

1 model instance total, for a created_at of 2014-09-30

1 model instance total, for a created_at of 2014-10-15

1 model instance total, for a created_at of 2014-10-16

1 model instance total, for a created_at of 2014-10-31

I have also tried to adjust the extended bounds, which doesn't shift
the bucket dates at all.

The result is that the last bucket is always giving a date of 2014-10-07.
This throws off the statistics because the last bucket isn't a full 30 days
of material, whereas the rest of buckets are.

My questions:

*- Why are the buckets always pivoting around October 7th? *My
expectation is that it pivots around 30 days prior to
extend_bounds["max"].
- Is there a way to tune this?

Thank you in advance for any help you can give.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/87fe5659-50c5-4870-8139-12a680b94c9e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/87fe5659-50c5-4870-8139-12a680b94c9e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/621c4eb3-ebb0-4447-bb5f-1741a10dfb71%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · October 19, 2014, 9:44pm

Hi Michael,

The thing is that buckets are not computed based on the current date and
going backwards, but based on January 1st 1970 (called Epoch) which is a
common origin of time for computers. So the first bucket would start on
January 1st 1970, then the second on January 31st, ... and if you keep on
doing it until October 2014, the bucket would start on the 7th (I think?).

I believe you could make it work the way that you expect by using the
pre_offset and post_offset options of the date histogram aggregation:

On Fri, Oct 17, 2014 at 2:20 AM, Michael Herold michael.j.herold@gmail.com
wrote:

Hi Adrien,

Thank you for the reply. I actually want 30 day buckets, not one month
buckets, for the calculation I'm doing. I would understand the weird offset
if I was using months as a unit since they are of variable length. However,
a day is always 1000 * 60 * 60 * 24 milliseconds, so why would that cause
an offset that is the 7th of the month?

Thank you,
Michael

On Thursday, October 16, 2014 6:56:39 PM UTC-5, Adrien Grand wrote:

Hi Michael,

Histogram aggregations return buckets that are a multiple of the
interval, you are getting this weird offset because not all months have
exactly 30 days. Setting "interval" to "month" should fix the issue?

On Thu, Oct 16, 2014 at 6:03 PM, Michael Herold michael....@gmail.com
wrote:

I'm trying to use elasticsearch to give me 30-day statistics for a given
collection of models (pertinent fields are a date in created_at and
an integer in value). Currently, I have this query/aggregation:

{
"query": {
"match_all": {}
},
"aggregations": {
"date_histogram": {
"field": "created_at",
"interval": "30d",
"min_doc_count": 0,
"extended_bounds": {
"min": 1381881600000, // Dynamically generated for 365 days
ago (This is 2013-10-16 00:00:00 +0000)
"max": 1413503999000 // Dynamically generated for end of
today (This is 2014-10-16 23:59:59 +0000)
}
},
"aggregations": {
"stats": {
"extended_stats": {
"field": "value"
}
}
}
}
}

It's working as expected, except for one thing: the buckets don't line
up as expected. For some reason, the last bucket always starts on
2014-10-07 00:00:00 +0000, regardless of what data is in elasticsearch.
I have tried this aggregation on a bunch of different date ranges,
including:

1 model instance per day for the past 30 days

1 model instance per day for the past 365 days

1 model instance total, for a created_at of 2014-09-30

1 model instance total, for a created_at of 2014-10-15

1 model instance total, for a created_at of 2014-10-16

1 model instance total, for a created_at of 2014-10-31

I have also tried to adjust the extended bounds, which doesn't shift
the bucket dates at all.

The result is that the last bucket is always giving a date of
2014-10-07. This throws off the statistics because the last bucket isn't a
full 30 days of material, whereas the rest of buckets are.

My questions:

*- Why are the buckets always pivoting around October 7th? *My
expectation is that it pivots around 30 days prior to
extend_bounds["max"].
- Is there a way to tune this?

Thank you in advance for any help you can give.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/87fe5659-50c5-4870-8139-12a680b94c9e%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/87fe5659-50c5-4870-8139-12a680b94c9e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/621c4eb3-ebb0-4447-bb5f-1741a10dfb71%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/621c4eb3-ebb0-4447-bb5f-1741a10dfb71%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j64XOCocGbJj_mN8vanhcdZok7bSAxK0wE1ZqHhL_96Fw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Michael_Herold · October 20, 2014, 2:35pm

Hi Adrien,

Thanks! The fact that the buckets start calculating from the UNIX epoch is
what I didn't understand. The fact that it always landed on October 7th --
which seems like an arbitrary date -- confused me. I did some quick
calculations and you're right; midnight on October 7th, 2014, is 545
30-day-buckets from the UNIX epoch. Huzzah!

I think you're right about the pre_offset and post_offset. I should be able
to calculate the needed offset(s) to get the effect that I want.

Thank you for taking the time to explain this to me. I appreciate it!

Best,
Michael

On Sun, Oct 19, 2014 at 4:44 PM, Adrien Grand <
adrien.grand@elasticsearch.com> wrote:

Hi Michael,

The thing is that buckets are not computed based on the current date and
going backwards, but based on January 1st 1970 (called Epoch) which is a
common origin of time for computers. So the first bucket would start on
January 1st 1970, then the second on January 31st, ... and if you keep on
doing it until October 2014, the bucket would start on the 7th (I think?).

I believe you could make it work the way that you expect by using the
pre_offset and post_offset options of the date histogram aggregation:
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Fri, Oct 17, 2014 at 2:20 AM, Michael Herold <
michael.j.herold@gmail.com> wrote:

Hi Adrien,

Thank you for the reply. I actually want 30 day buckets, not one month
buckets, for the calculation I'm doing. I would understand the weird offset
if I was using months as a unit since they are of variable length. However,
a day is always 1000 * 60 * 60 * 24 milliseconds, so why would that cause
an offset that is the 7th of the month?

Thank you,
Michael

On Thursday, October 16, 2014 6:56:39 PM UTC-5, Adrien Grand wrote:

Hi Michael,

Histogram aggregations return buckets that are a multiple of the
interval, you are getting this weird offset because not all months have
exactly 30 days. Setting "interval" to "month" should fix the issue?

On Thu, Oct 16, 2014 at 6:03 PM, Michael Herold michael....@gmail.com
wrote:

I'm trying to use elasticsearch to give me 30-day statistics for a
given collection of models (pertinent fields are a date in created_at
and an integer in value). Currently, I have this query/aggregation:

{
"query": {
"match_all": {}
},
"aggregations": {
"date_histogram": {
"field": "created_at",
"interval": "30d",
"min_doc_count": 0,
"extended_bounds": {
"min": 1381881600000, // Dynamically generated for 365 days
ago (This is 2013-10-16 00:00:00 +0000)
"max": 1413503999000 // Dynamically generated for end of
today (This is 2014-10-16 23:59:59 +0000)
}
},
"aggregations": {
"stats": {
"extended_stats": {
"field": "value"
}
}
}
}
}

It's working as expected, except for one thing: the buckets don't line
up as expected. For some reason, the last bucket always starts on
2014-10-07 00:00:00 +0000, regardless of what data is in elasticsearch.
I have tried this aggregation on a bunch of different date ranges,
including:

1 model instance per day for the past 30 days

1 model instance per day for the past 365 days

1 model instance total, for a created_at of 2014-09-30

1 model instance total, for a created_at of 2014-10-15

1 model instance total, for a created_at of 2014-10-16

1 model instance total, for a created_at of 2014-10-31

I have also tried to adjust the extended bounds, which doesn't shift
the bucket dates at all.

The result is that the last bucket is always giving a date of
2014-10-07. This throws off the statistics because the last bucket isn't a
full 30 days of material, whereas the rest of buckets are.

My questions:

*- Why are the buckets always pivoting around October 7th? *My
expectation is that it pivots around 30 days prior to
extend_bounds["max"].
- Is there a way to tune this?

Thank you in advance for any help you can give.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/87fe5659-50c5-4870-8139-12a680b94c9e%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/87fe5659-50c5-4870-8139-12a680b94c9e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/621c4eb3-ebb0-4447-bb5f-1741a10dfb71%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/621c4eb3-ebb0-4447-bb5f-1741a10dfb71%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/gz_M_cy4g_Q/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j64XOCocGbJj_mN8vanhcdZok7bSAxK0wE1ZqHhL_96Fw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j64XOCocGbJj_mN8vanhcdZok7bSAxK0wE1ZqHhL_96Fw%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CACEqm-FCCL07FrZdEr3Asypoz_PJCbyNm2F_PcbKp2b8RSbOSA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Date Histogram empty buckets and number of buckets Elasticsearch	5	2825	July 5, 2017
Date_histogram buckets not as expected Elasticsearch	10	908	March 30, 2017
Date histogram doesn't create buckets for dates later than the last record found, even with 'extended_bounds' min value of 0. How do I fix this? Elasticsearch	4	454	November 20, 2019
Date Histogram Aggregation with predefined number of buckets possible? Elasticsearch	2	362	September 22, 2021
Date Histogram: Can I force the last bucket to be a full interval? Elasticsearch	4	1467	November 9, 2017

How does the date_histogram aggregation choose its buckets? Is this tunable?

Related topics