I'm trying to use elasticsearch to give me 30-day statistics for a given
collection of models (pertinent fields are a date in created_at and an
integer in value). Currently, I have this query/aggregation:
{
"query": {
"match_all": {}
},
"aggregations": {
"date_histogram": {
"field": "created_at",
"interval": "30d",
"min_doc_count": 0,
"extended_bounds": {
"min": 1381881600000, // Dynamically generated for 365 days ago
(This is 2013-10-16 00:00:00 +0000)
"max": 1413503999000 // Dynamically generated for end of today
(This is 2014-10-16 23:59:59 +0000)
}
},
"aggregations": {
"stats": {
"extended_stats": {
"field": "value"
}
}
}
}
}
It's working as expected, except for one thing: the buckets don't line up
as expected. For some reason, the last bucket always starts on 2014-10-07
00:00:00 +0000, regardless of what data is in elasticsearch. I have tried
this aggregation on a bunch of different date ranges, including:
1 model instance per day for the past 30 days
1 model instance per day for the past 365 days
1 model instance total, for a created_at of 2014-09-30
1 model instance total, for a created_at of 2014-10-15
1 model instance total, for a created_at of 2014-10-16
1 model instance total, for a created_at of 2014-10-31
I have also tried to adjust the extended bounds, which doesn't shift the
bucket dates at all.
The result is that the last bucket is always giving a date of 2014-10-07.
This throws off the statistics because the last bucket isn't a full 30 days
of material, whereas the rest of buckets are.
My questions:
*- Why are the buckets always pivoting around October 7th? *My expectation
is that it pivots around 30 days prior to extend_bounds["max"]. - Is there a way to tune this?
Histogram aggregations return buckets that are a multiple of the interval,
you are getting this weird offset because not all months have exactly 30
days. Setting "interval" to "month" should fix the issue?
I'm trying to use elasticsearch to give me 30-day statistics for a given
collection of models (pertinent fields are a date in created_at and an
integer in value). Currently, I have this query/aggregation:
{
"query": {
"match_all": {}
},
"aggregations": {
"date_histogram": {
"field": "created_at",
"interval": "30d",
"min_doc_count": 0,
"extended_bounds": {
"min": 1381881600000, // Dynamically generated for 365 days ago
(This is 2013-10-16 00:00:00 +0000)
"max": 1413503999000 // Dynamically generated for end of today
(This is 2014-10-16 23:59:59 +0000)
}
},
"aggregations": {
"stats": {
"extended_stats": {
"field": "value"
}
}
}
}
}
It's working as expected, except for one thing: the buckets don't line up
as expected. For some reason, the last bucket always starts on
2014-10-07 00:00:00 +0000, regardless of what data is in elasticsearch.
I have tried this aggregation on a bunch of different date ranges,
including:
1 model instance per day for the past 30 days
1 model instance per day for the past 365 days
1 model instance total, for a created_at of 2014-09-30
1 model instance total, for a created_at of 2014-10-15
1 model instance total, for a created_at of 2014-10-16
1 model instance total, for a created_at of 2014-10-31
I have also tried to adjust the extended bounds, which doesn't shift
the bucket dates at all.
The result is that the last bucket is always giving a date of 2014-10-07.
This throws off the statistics because the last bucket isn't a full 30 days
of material, whereas the rest of buckets are.
My questions:
*- Why are the buckets always pivoting around October 7th? *My
expectation is that it pivots around 30 days prior to extend_bounds["max"]. - Is there a way to tune this?
Thank you for the reply. I actually want 30 day buckets, not one month
buckets, for the calculation I'm doing. I would understand the weird offset
if I was using months as a unit since they are of variable length. However,
a day is always 1000 * 60 * 60 * 24 milliseconds, so why would that cause
an offset that is the 7th of the month?
Thank you,
Michael
On Thursday, October 16, 2014 6:56:39 PM UTC-5, Adrien Grand wrote:
Hi Michael,
Histogram aggregations return buckets that are a multiple of the interval,
you are getting this weird offset because not all months have exactly 30
days. Setting "interval" to "month" should fix the issue?
On Thu, Oct 16, 2014 at 6:03 PM, Michael Herold <michael....@gmail.com
<javascript:>> wrote:
I'm trying to use elasticsearch to give me 30-day statistics for a given
collection of models (pertinent fields are a date in created_at and an
integer in value). Currently, I have this query/aggregation:
{
"query": {
"match_all": {}
},
"aggregations": {
"date_histogram": {
"field": "created_at",
"interval": "30d",
"min_doc_count": 0,
"extended_bounds": {
"min": 1381881600000, // Dynamically generated for 365 days ago
(This is 2013-10-16 00:00:00 +0000)
"max": 1413503999000 // Dynamically generated for end of today
(This is 2014-10-16 23:59:59 +0000)
}
},
"aggregations": {
"stats": {
"extended_stats": {
"field": "value"
}
}
}
}
}
It's working as expected, except for one thing: the buckets don't line up
as expected. For some reason, the last bucket always starts on
2014-10-07 00:00:00 +0000, regardless of what data is in elasticsearch.
I have tried this aggregation on a bunch of different date ranges,
including:
1 model instance per day for the past 30 days
1 model instance per day for the past 365 days
1 model instance total, for a created_at of 2014-09-30
1 model instance total, for a created_at of 2014-10-15
1 model instance total, for a created_at of 2014-10-16
1 model instance total, for a created_at of 2014-10-31
I have also tried to adjust the extended bounds, which doesn't shift
the bucket dates at all.
The result is that the last bucket is always giving a date of 2014-10-07.
This throws off the statistics because the last bucket isn't a full 30 days
of material, whereas the rest of buckets are.
My questions:
*- Why are the buckets always pivoting around October 7th? *My
expectation is that it pivots around 30 days prior to extend_bounds["max"]. - Is there a way to tune this?
The thing is that buckets are not computed based on the current date and
going backwards, but based on January 1st 1970 (called Epoch) which is a
common origin of time for computers. So the first bucket would start on
January 1st 1970, then the second on January 31st, ... and if you keep on
doing it until October 2014, the bucket would start on the 7th (I think?).
I believe you could make it work the way that you expect by using the
pre_offset and post_offset options of the date histogram aggregation:
Thank you for the reply. I actually want 30 day buckets, not one month
buckets, for the calculation I'm doing. I would understand the weird offset
if I was using months as a unit since they are of variable length. However,
a day is always 1000 * 60 * 60 * 24 milliseconds, so why would that cause
an offset that is the 7th of the month?
Thank you,
Michael
On Thursday, October 16, 2014 6:56:39 PM UTC-5, Adrien Grand wrote:
Hi Michael,
Histogram aggregations return buckets that are a multiple of the
interval, you are getting this weird offset because not all months have
exactly 30 days. Setting "interval" to "month" should fix the issue?
I'm trying to use elasticsearch to give me 30-day statistics for a given
collection of models (pertinent fields are a date in created_at and
an integer in value). Currently, I have this query/aggregation:
{
"query": {
"match_all": {}
},
"aggregations": {
"date_histogram": {
"field": "created_at",
"interval": "30d",
"min_doc_count": 0,
"extended_bounds": {
"min": 1381881600000, // Dynamically generated for 365 days
ago (This is 2013-10-16 00:00:00 +0000)
"max": 1413503999000 // Dynamically generated for end of
today (This is 2014-10-16 23:59:59 +0000)
}
},
"aggregations": {
"stats": {
"extended_stats": {
"field": "value"
}
}
}
}
}
It's working as expected, except for one thing: the buckets don't line
up as expected. For some reason, the last bucket always starts on
2014-10-07 00:00:00 +0000, regardless of what data is in elasticsearch.
I have tried this aggregation on a bunch of different date ranges,
including:
1 model instance per day for the past 30 days
1 model instance per day for the past 365 days
1 model instance total, for a created_at of 2014-09-30
1 model instance total, for a created_at of 2014-10-15
1 model instance total, for a created_at of 2014-10-16
1 model instance total, for a created_at of 2014-10-31
I have also tried to adjust the extended bounds, which doesn't shift
the bucket dates at all.
The result is that the last bucket is always giving a date of
2014-10-07. This throws off the statistics because the last bucket isn't a
full 30 days of material, whereas the rest of buckets are.
My questions:
*- Why are the buckets always pivoting around October 7th? *My
expectation is that it pivots around 30 days prior to extend_bounds["max"]. - Is there a way to tune this?
Thanks! The fact that the buckets start calculating from the UNIX epoch is
what I didn't understand. The fact that it always landed on October 7th --
which seems like an arbitrary date -- confused me. I did some quick
calculations and you're right; midnight on October 7th, 2014, is 545
30-day-buckets from the UNIX epoch. Huzzah!
I think you're right about the pre_offset and post_offset. I should be able
to calculate the needed offset(s) to get the effect that I want.
Thank you for taking the time to explain this to me. I appreciate it!
The thing is that buckets are not computed based on the current date and
going backwards, but based on January 1st 1970 (called Epoch) which is a
common origin of time for computers. So the first bucket would start on
January 1st 1970, then the second on January 31st, ... and if you keep on
doing it until October 2014, the bucket would start on the 7th (I think?).
Thank you for the reply. I actually want 30 day buckets, not one month
buckets, for the calculation I'm doing. I would understand the weird offset
if I was using months as a unit since they are of variable length. However,
a day is always 1000 * 60 * 60 * 24 milliseconds, so why would that cause
an offset that is the 7th of the month?
Thank you,
Michael
On Thursday, October 16, 2014 6:56:39 PM UTC-5, Adrien Grand wrote:
Hi Michael,
Histogram aggregations return buckets that are a multiple of the
interval, you are getting this weird offset because not all months have
exactly 30 days. Setting "interval" to "month" should fix the issue?
I'm trying to use elasticsearch to give me 30-day statistics for a
given collection of models (pertinent fields are a date in created_at
and an integer in value). Currently, I have this query/aggregation:
{
"query": {
"match_all": {}
},
"aggregations": {
"date_histogram": {
"field": "created_at",
"interval": "30d",
"min_doc_count": 0,
"extended_bounds": {
"min": 1381881600000, // Dynamically generated for 365 days
ago (This is 2013-10-16 00:00:00 +0000)
"max": 1413503999000 // Dynamically generated for end of
today (This is 2014-10-16 23:59:59 +0000)
}
},
"aggregations": {
"stats": {
"extended_stats": {
"field": "value"
}
}
}
}
}
It's working as expected, except for one thing: the buckets don't line
up as expected. For some reason, the last bucket always starts on
2014-10-07 00:00:00 +0000, regardless of what data is in elasticsearch.
I have tried this aggregation on a bunch of different date ranges,
including:
1 model instance per day for the past 30 days
1 model instance per day for the past 365 days
1 model instance total, for a created_at of 2014-09-30
1 model instance total, for a created_at of 2014-10-15
1 model instance total, for a created_at of 2014-10-16
1 model instance total, for a created_at of 2014-10-31
I have also tried to adjust the extended bounds, which doesn't shift
the bucket dates at all.
The result is that the last bucket is always giving a date of
2014-10-07. This throws off the statistics because the last bucket isn't a
full 30 days of material, whereas the rest of buckets are.
My questions:
*- Why are the buckets always pivoting around October 7th? *My
expectation is that it pivots around 30 days prior to extend_bounds["max"]. - Is there a way to tune this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.