Timezones in date histograms, redux


(Andrew Clegg) #1

Hi,

I've seen a lot of discussion of this in various old threads, and I've
spent about two hours going over the docs and code and brainstorming with
my colleagues, but for the life of me: I still can't make this work right.

Basically, we are doing date ranges with date histogram facets, and need to
support users in different timezones querying the same data set.

For each user, I want to show hourly, daily, weekly or monthly facets,
relative to their OWN timezone, starting at the range filter's lower bound.

The filter's boundaries are always exactly on an hour (in the user's time
zone), and for bucket sizes of daily or greater, the they'll will be at
00:00:00 (in the user's time zone).

So, I might come along and say "I want to see all data between 3 Jan and 8
Jan, by day" -- and if I'm in GMT it's easy:

{
"query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"range": {
"datetime": {
"gte": "2012-01-03T00:00:00Z",
"lt":"2012-01-07T00:00:00Z"
}
}
}
}
},
"facets": {
"histo" : {
"date_histogram" : {
"field" : "datetime",
"interval" : "1d"
}
}
}
}

This returns evenly-spaced days starting on Jan 3rd.

But, I am too dumb to make this work if the lower bound of the range filter
isn't on a midnight UTC moment. For example if it's midnight EST/05:00UTC.
Or for that matter, midnight British Summer Time. The server-side UTC
calculations always give us unexpected bucket boundaries.

We've been juggling very possible combination of pre_zone, post_zone,
pre_offset, post_offset etc., but just can't find a combination that will
give us regular hourly/weekly/daily/monthly intervals starting at a given
point in time like that.

Please help. I know it's possible but this stuff makes my head hurt.

Thanks :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mal Curtis) #2

Hi Andrew,

Did you ever figure this one out?

Currently we're thinking we're going to store date information both as the
UTC, and as the local time AS utc (i.e. 12.30 in +1200 becomes 12.30 UTC).
So we're denormalizing the time into two fields with different
representations of the same data.

-Mal

On Saturday, 15 June 2013 04:44:12 UTC+12, Andrew Clegg wrote:

Hi,

I've seen a lot of discussion of this in various old threads, and I've
spent about two hours going over the docs and code and brainstorming with
my colleagues, but for the life of me: I still can't make this work right.

Basically, we are doing date ranges with date histogram facets, and need
to support users in different timezones querying the same data set.

For each user, I want to show hourly, daily, weekly or monthly facets,
relative to their OWN timezone, starting at the range filter's lower bound.

The filter's boundaries are always exactly on an hour (in the user's time
zone), and for bucket sizes of daily or greater, the they'll will be at
00:00:00 (in the user's time zone).

So, I might come along and say "I want to see all data between 3 Jan and 8
Jan, by day" -- and if I'm in GMT it's easy:

{
"query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"range": {
"datetime": {
"gte": "2012-01-03T00:00:00Z",
"lt":"2012-01-07T00:00:00Z"
}
}
}
}
},
"facets": {
"histo" : {
"date_histogram" : {
"field" : "datetime",
"interval" : "1d"
}
}
}
}

This returns evenly-spaced days starting on Jan 3rd.

But, I am too dumb to make this work if the lower bound of the range
filter isn't on a midnight UTC moment. For example if it's midnight
EST/05:00UTC. Or for that matter, midnight British Summer Time. The
server-side UTC calculations always give us unexpected bucket boundaries.

We've been juggling very possible combination of pre_zone, post_zone,
pre_offset, post_offset etc., but just can't find a combination that will
give us regular hourly/weekly/daily/monthly intervals starting at a given
point in time like that.

Please help. I know it's possible but this stuff makes my head hurt.

Thanks :slight_smile:

--
Watch the fun new 2 minute and 2 second video tour of Vendhttp://youtu.be/zh2X8FyKyMI

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ba013a5d-e351-4745-813a-703fa051820e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Nils) #3

What you are looking for is the only not-so-sensible-default I have thus
far found in Elasticsearch. By default it offsets the times to UTC to
easily create the day buckets. By setting 'pre_zone_adjust_large_interval'
to true the keys of your buckets will be in the requested timezone.

This also works for the newer aggregations.

-- Nils

On Thursday, May 1, 2014 10:48:49 PM UTC+2, Mal Curtis wrote:

Hi Andrew,

Did you ever figure this one out?

Currently we're thinking we're going to store date information both as the
UTC, and as the local time AS utc (i.e. 12.30 in +1200 becomes 12.30 UTC).
So we're denormalizing the time into two fields with different
representations of the same data.

-Mal

On Saturday, 15 June 2013 04:44:12 UTC+12, Andrew Clegg wrote:

Hi,

I've seen a lot of discussion of this in various old threads, and I've
spent about two hours going over the docs and code and brainstorming with
my colleagues, but for the life of me: I still can't make this work right.

Basically, we are doing date ranges with date histogram facets, and need
to support users in different timezones querying the same data set.

For each user, I want to show hourly, daily, weekly or monthly facets,
relative to their OWN timezone, starting at the range filter's lower bound.

The filter's boundaries are always exactly on an hour (in the user's time
zone), and for bucket sizes of daily or greater, the they'll will be at
00:00:00 (in the user's time zone).

So, I might come along and say "I want to see all data between 3 Jan and
8 Jan, by day" -- and if I'm in GMT it's easy:

{
"query": {
"filtered" : {
"query" : {
"match_all" : {}
},
"filter" : {
"range": {
"datetime": {
"gte": "2012-01-03T00:00:00Z",
"lt":"2012-01-07T00:00:00Z"
}
}
}
}
},
"facets": {
"histo" : {
"date_histogram" : {
"field" : "datetime",
"interval" : "1d"
}
}
}
}

This returns evenly-spaced days starting on Jan 3rd.

But, I am too dumb to make this work if the lower bound of the range
filter isn't on a midnight UTC moment. For example if it's midnight
EST/05:00UTC. Or for that matter, midnight British Summer Time. The
server-side UTC calculations always give us unexpected bucket boundaries.

We've been juggling very possible combination of pre_zone, post_zone,
pre_offset, post_offset etc., but just can't find a combination that will
give us regular hourly/weekly/daily/monthly intervals starting at a given
point in time like that.

Please help. I know it's possible but this stuff makes my head hurt.

Thanks :slight_smile:

Watch the fun new 2 minute and 2 second video tour of Vendhttp://youtu.be/zh2X8FyKyMI

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/09458726-cc58-4ad5-bb19-cb53b2542b80%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #4