Date histogram & time zones


(Eric Jain) #1

I created a facet like so:

FacetBuilders.dateHistogramFacet("date")
.field("timestamp").interval("month").zone("-08:00");

And extracted the results like so:

DateHistogramFacet date = facets.facet(DateHistogramFacet.class,
id);
for (DateHistogramFacet.Entry entry : date.entries()) {
long time = entry.getTime();
...
}

The first time value was 1296518400000, which is
2011-02-01T00:00:00.000Z or 2011-01-31T16:00:00.000-08:00. Shouldn't
this have been 2011-02-01T00:00:00.000-08:00?


(Shay Banon) #2

Which version are you using?

On Tuesday, February 21, 2012 at 9:07 AM, Eric Jain wrote:

I created a facet like so:

FacetBuilders.dateHistogramFacet("date")
.field("timestamp").interval("month").zone("-08:00");

And extracted the results like so:

DateHistogramFacet date = facets.facet(DateHistogramFacet.class,
id);
for (DateHistogramFacet.Entry entry : date.entries()) {
long time = entry.getTime();
...
}

The first time value was 1296518400000, which is
2011-02-01T00:00:00.000Z or 2011-01-31T16:00:00.000-08:00. Shouldn't
this have been 2011-02-01T00:00:00.000-08:00?


(Eric Jain) #3

On Tue, Feb 21, 2012 at 02:43, Shay Banon kimchy@gmail.com wrote:

Which version are you using?

I was using 0.19.0.RC2. But I see the same behavior after upgrading to
0.19.0.RC3 (and replacing the call to .zone with .preZone).


(Shay Banon) #4

Yes, preZone will apply the zone offset before rounding, but still return in UTC, you can add postZone to apply it post rounding as well. (this is new in 0.19, and helps fix a problem where zone rounding did not work for lower than day resolution, as well as enhance the date histogram to still return UTC dates - where you sometimes need it, mainly to trick javascript char libs :slight_smile: ).

On Tuesday, February 21, 2012 at 9:52 PM, Eric Jain wrote:

On Tue, Feb 21, 2012 at 02:43, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

Which version are you using?

I was using 0.19.0.RC2. But I see the same behavior after upgrading to
0.19.0.RC3 (and replacing the call to .zone with .preZone).


(Eric Jain) #5

On Tue, Feb 21, 2012 at 15:43, Shay Banon kimchy@gmail.com wrote:

Yes, preZone will apply the zone offset before rounding, but still return in
UTC, you can add postZone to apply it post rounding as well. (this is new in
0.19, and helps fix a problem where zone rounding did not work for lower
than day resolution, as well as enhance the date histogram to still return
UTC dates - where you sometimes need it, mainly to trick javascript char
libs :slight_smile: ).

So I do .preZone("-08:00").postZone("-08:00"), but
DateHistogramFacet.Entry.getTime() still returns the millis truncated
to the month using UTC.

btw is .preZone("-8") supposed to work? It's mentioned at
http://www.elasticsearch.org/guide/reference/api/search/facets/date-histogram-facet.html,
but is rejected , at least when using the Java API.


(Shay Banon) #6

What you will get with postZone is the start of the month in UTC offseted by the time zone. If you don't see it, then create a test that shows it (Here is the test that is used in ES: https://github.com/elasticsearch/elasticsearch/blob/master/src/test/java/org/elasticsearch/test/integration/search/facet/SimpleFacetsTests.java#L1405).

And yea, in json, the numeric value -2 is supported, but in Java, its a string value, so it always needs to be in time zone offset format (which is recommended).

On Wednesday, February 22, 2012 at 2:57 AM, Eric Jain wrote:

On Tue, Feb 21, 2012 at 15:43, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

Yes, preZone will apply the zone offset before rounding, but still return in
UTC, you can add postZone to apply it post rounding as well. (this is new in
0.19, and helps fix a problem where zone rounding did not work for lower
than day resolution, as well as enhance the date histogram to still return
UTC dates - where you sometimes need it, mainly to trick javascript char
libs :slight_smile: ).

So I do .preZone("-08:00").postZone("-08:00"), but
DateHistogramFacet.Entry.getTime() still returns the millis truncated
to the month using UTC.

btw is .preZone("-8") supposed to work? It's mentioned at
http://www.elasticsearch.org/guide/reference/api/search/facets/date-histogram-facet.html,
but is rejected , at least when using the Java API.


(Eric Jain) #7

On Fri, Feb 24, 2012 at 05:25, Shay Banon kimchy@gmail.com wrote:

What you will get with postZone is the start of the month in UTC offseted by
the time zone.

I expected the start of the month to be given using the specified time
zone rather than UTC, but it's not a big deal as the counts appear to
be correct.


(Shay Banon) #8

The start time will be expected to be UTC if no time zone information is provided in the date string. The date format already supports providing the relevant time zone.

On Friday, February 24, 2012 at 8:25 PM, Eric Jain wrote:

On Fri, Feb 24, 2012 at 05:25, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

What you will get with postZone is the start of the month in UTC offseted by
the time zone.

I expected the start of the month to be given using the specified time
zone rather than UTC, but it's not a big deal as the counts appear to
be correct.


(Eric Jain) #9

On Sun, Feb 26, 2012 at 11:34, Shay Banon kimchy@gmail.com wrote:

What you will get with postZone is the start of the month in UTC offseted by
the time zone.

If I do:

"date_histogram" : {
"field" : "timestamp",
"interval" : "month",
"pre_zone" : "-08:00",
"post_zone" : "-08:00"
}

I get 1296489600000L, which is 2011-01-31T08:00:00.000-08:00 or
2011-01-31T16:00:00.000Z.

To get 2011-02, I can either not set post_zone and work around the
fact the the millis returned represent 2011-02-01T00:00:00.000Z
instead of 2011-02-01T00:00:00.000-08:00, or else set post_zone to
+08:00.

This looks like a bug to me--unless I'm still misunderstanding how
this is supposed to work...


(Shay Banon) #10

I think you have a problem in how you construct the date and test for it. The one without the preOffset will return the mills in UTC, and the one with the postOffset will return the time in and offset against the UTC. Here is a simple test case: https://gist.github.com/1926706.

On Monday, February 27, 2012 at 8:28 PM, Eric Jain wrote:

On Sun, Feb 26, 2012 at 11:34, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

What you will get with postZone is the start of the month in UTC offseted by
the time zone.

If I do:

"date_histogram" : {
"field" : "timestamp",
"interval" : "month",
"pre_zone" : "-08:00",
"post_zone" : "-08:00"
}

I get 1296489600000L, which is 2011-01-31T08:00:00.000-08:00 or
2011-01-31T16:00:00.000Z.

To get 2011-02, I can either not set post_zone and work around the
fact the the millis returned represent 2011-02-01T00:00:00.000Z
instead of 2011-02-01T00:00:00.000-08:00, or else set post_zone to
+08:00.

This looks like a bug to me--unless I'm still misunderstanding how
this is supposed to work...


(Eric Jain) #11

On Mon, Feb 27, 2012 at 12:12, Shay Banon kimchy@gmail.com wrote:

I think you have a problem in how you construct the date and test for it.
The one without the preOffset will return the mills in UTC, and the one with
the postOffset will return the time in and offset against the UTC. Here is a
simple test case: https://gist.github.com/1926706.

I'm expecting 'new DateTime(facet2.getEntries().get(0).time(),
DateTimeZone.forOffsetHours(-8))' to be
'2012-01-01T00:00:00.000-08:00' ('utcExpected -
TimeUnit.HOURS.toMillis(-8)'), not
'2011-12-31T08:00:00.000-08:00' ('utcExpected - TimeUnit.HOURS.toMillis(8)').


(Shay Banon) #12

You are not using Joda correctly, check the javadoc for DateTime(long instant, DateTimeZone zone). The instant is expected to be UTC milliseconds. What you end up doing is getting back from elasticsearch the time already with the time zone (-8), and then "apply" it again when creating the DateTime (another -8).

On Monday, February 27, 2012 at 11:39 PM, Eric Jain wrote:

On Mon, Feb 27, 2012 at 12:12, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

I think you have a problem in how you construct the date and test for it.
The one without the preOffset will return the mills in UTC, and the one with
the postOffset will return the time in and offset against the UTC. Here is a
simple test case: https://gist.github.com/1926706.

I'm expecting 'new DateTime(facet2.getEntries().get(0).time(),
DateTimeZone.forOffsetHours(-8))' to be
'2012-01-01T00:00:00.000-08:00' ('utcExpected -
TimeUnit.HOURS.toMillis(-8)'), not
'2011-12-31T08:00:00.000-08:00' ('utcExpected - TimeUnit.HOURS.toMillis(8)').


(Eric Jain) #13

On Tue, Feb 28, 2012 at 03:02, Shay Banon kimchy@gmail.com wrote:

You are not using Joda correctly, check the javadoc for DateTime(long
instant, DateTimeZone zone). The instant is expected to be UTC milliseconds.
What you end up doing is getting back from elasticsearch the time already
with the time zone (-8), and then "apply" it again when creating the
DateTime (another -8).

I expected elasticsearch to always return UTC milliseconds, so that
explains part of the confusion. But even without postZone, it's not
clear to me why I need to add the timezone offset back to the result
to get the expected local time for year/month/day intervals, but not
for hour/minute intervals. See https://gist.github.com/1934291.


(Shay Banon) #14

First, your gist does not use postZone, but preZone. Let me explain the work that was done in 0.19 to improve the rounding with time zones (its tricky).

Lets say you want to round "2012-04-01T04:15:30Z" (which is in UTC what you use in your test: "2012-03-31T20:15:30-08:00").

First, lets do day level rounding with preZone set to -8. "2012-04-01T04:15:30Z" - 8 falls into 2012-03-31, and it will be returned (since postZone defaults to UTC) in UTC, so it all be "2012-03-31T00:00:00Z". So, the result is rounded and returned in UTC (start of month in UTC).

Second, we do hour based rounding with preZone set to -8. "2012-04-01T04:15:30Z" - 8 falls at "2012-03-31T20:15:30", rounding it will mean its "2012-03-31T20:00:00", but, we want to return it in UTC, and the correct rounded time in UTC is: "2012-04-01T04:00:00Z". Here, we do add back the offset, so you get the proper rounded value in UTC.

So, you can just do preZone, and create a new DateTime with the returned time (with UTC time zone). It will be the UTC rounded value that you want. If you want to apply a postZone, then you can, and then you will get the mentioned results, just with the offset applied. So, for day you will get back "2012-03-31T00:00:00-08:00", and for hour you will get back: "2012-04-01T04:00:00-08:00".

I can see where the confusion is coming from, I can add a flag that would make elasticsearch behave to what you expect in the test, though the above behavior is probably what you want. I will add this flag just so we cover all the options of rounding.

On Tuesday, February 28, 2012 at 11:55 PM, Eric Jain wrote:

On Tue, Feb 28, 2012 at 03:02, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

You are not using Joda correctly, check the javadoc for DateTime(long
instant, DateTimeZone zone). The instant is expected to be UTC milliseconds.
What you end up doing is getting back from elasticsearch the time already
with the time zone (-8), and then "apply" it again when creating the
DateTime (another -8).

I expected elasticsearch to always return UTC milliseconds, so that
explains part of the confusion. But even without postZone, it's not
clear to me why I need to add the timezone offset back to the result
to get the expected local time for year/month/day intervals, but not
for hour/minute intervals. See https://gist.github.com/1934291.


(Eric Jain) #15

On Tue, Feb 28, 2012 at 16:13, Shay Banon kimchy@gmail.com wrote:

I can see where the confusion is coming from, I can add a flag that would
make elasticsearch behave to what you expect in the test, though the above
behavior is probably what you want. I will add this flag just so we cover
all the options of rounding.

Thanks for the detailed explanation!

So if I index 2012-03-31T20:15:30-08:00 (or 2012-04-01T04:15:30Z), the
new flag would let me get:

2012-01-01T00:00:00-08:00 [interval=year, zone=-08:00]
2012-03-01T00:00:00-08:00 [interval=month, zone=-08:00]
2012-03-31T00:00:00-08:00 [interval=day, zone=-08:00]
2012-03-31T20:00:00-08:00 [interval=hour, zone=-08:00]
2012-03-31T20:15:00-08:00 [interval=minute, zone=-08:00]

Correct?


(Shay Banon) #16

Yes. Though I still think that its not what you would want, I guess you will get to it when you try and graph it.

On Wednesday, February 29, 2012 at 2:59 AM, Eric Jain wrote:

On Tue, Feb 28, 2012 at 16:13, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

I can see where the confusion is coming from, I can add a flag that would
make elasticsearch behave to what you expect in the test, though the above
behavior is probably what you want. I will add this flag just so we cover
all the options of rounding.

Thanks for the detailed explanation!

So if I index 2012-03-31T20:15:30-08:00 (or 2012-04-01T04:15:30Z), the
new flag would let me get:

2012-01-01T00:00:00-08:00 [interval=year, zone=-08:00]
2012-03-01T00:00:00-08:00 [interval=month, zone=-08:00]
2012-03-31T00:00:00-08:00 [interval=day, zone=-08:00]
2012-03-31T20:00:00-08:00 [interval=hour, zone=-08:00]
2012-03-31T20:15:00-08:00 [interval=minute, zone=-08:00]

Correct?


(Eric Jain) #17

On Wed, Feb 29, 2012 at 05:26, Shay Banon kimchy@gmail.com wrote:

Yes. Though I still think that its not what you would want, I guess you will
get to it when you try and graph it.

Perhaps I'm being stupid, but I don't see why I'd ever want
2012-03-31T20:15:30-08:00 (or 2012-04-01T04:15:30Z) rounded to the
hour to be 2012-04-01T04:00:00-08:00 rather than
2012-03-31T20:00:00-08:00.

Is there an existing ticket, or shall I create one?


(Shay Banon) #18

Its already there: https://github.com/elasticsearch/elasticsearch/issues/1744.

On Wednesday, February 29, 2012 at 8:31 PM, Eric Jain wrote:

On Wed, Feb 29, 2012 at 05:26, Shay Banon <kimchy@gmail.com (mailto:kimchy@gmail.com)> wrote:

Yes. Though I still think that its not what you would want, I guess you will
get to it when you try and graph it.

Perhaps I'm being stupid, but I don't see why I'd ever want
2012-03-31T20:15:30-08:00 (or 2012-04-01T04:15:30Z) rounded to the
hour to be 2012-04-01T04:00:00-08:00 rather than
2012-03-31T20:00:00-08:00.

Is there an existing ticket, or shall I create one?


(system) #19