Filtering/aggregating by a timezone field in each document instead of the same timezone for all documents

oeed · July 30, 2019, 11:08pm

I'm fairly new to Elasticsearch so I'm not 100% sure it's appropriate for our use case, so let me know if it isn't.

We are performing some basic analytics for exposure time from specific locations. I want to be able to filter and aggregate these based on the local time (i.e. total exposure time at 9am, 10am, etc.). Each location can have different timezones and, importantly, some time zones have half-hour offsets (specifically Adelaide) - meaning I can't easily convert the timezones later on our application as histogram intervals in half-hour timezones would run from 9:30am - 10:30am for example. (I hope that makes sense but happy to clarify)

My question is, can I somehow 'join' in a field from each document and use that in the date_histogram time_zone aggregation. For example, some abridged documents look like:

POST delivery-abcd-1234/_doc/
{
  "date": "2019-07-30T22:39:52.678Z",
  "exposureTime": 4989,
  "siteID": "TestSite1",
  "timezone": "Australia/Adelaide"
}

POST delivery-abcd-1234/_doc/
{
  "date": "2019-07-30T22:42:16.876Z",
  "exposureTime": 5012,
  "siteID": "TestSite2",
  "timezone": "Australia/Sydney"
}

I then aggregate them like below. I want to somehow replace the constant "Australia/Adelaide" with some reference to the timezone in each document. I would also like to use a range filter in a similar way if possible.

POST delivery-abcd-1234/_doc/_search?size=0
{
  "aggregations": {
    "by_interval": {
      "date_histogram" : {
        "field" : "date",
        "interval" : "hour",
        "time_zone": "Australia/Adelaide"
      },
      "aggregations": {
        "total_exposure": {
          "sum": { "field": "exposureTime" }
        }
      }
    }
  }
}

Is something like this possible/wise? Are there any good alternative ways? I was wondering if perhaps having an index per timezone would be a better approach (although ideally, I wouldn't denormalise the timezone into the document, instead join it from another index).

polyfractal · August 2, 2019, 8:21pm

I would store the original dates with the timezone included. Date fields are time zone aware, and will be stored internally as whatever the UTC equivalent is. But right now that information is being "lost" because you're storing the dates as UTC ("Z") and keeping the timezone in a separate field.

Available date patterns are here, and you can specify entirely custom patterns using java-time format. In this case it looks like you'd probably want V for the official time-zone IDs like Australia/Adelaide

Then when you aggregate, the aggregation is working on a normalized UTC version. The time_zone parameter is used to shift around the buckets so they make sense to whoever is viewing the results, rather than trying to munge the data from different timezones into one zone.

Keeping the separate timezone field is fine too, it's sometimes useful to do a search like "find all documents that came from Australia/Sydney", but I would also include it in the timestamp itself.

There are probably ways to do what you asked, in particular you can use a script on the date_histogram to munge up the returned value... but it sounds necessary in this case (I think)

system · August 30, 2019, 8:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Aggregation with different timezones Elasticsearch	2	293	August 17, 2021
Timezone-affected date histogram Elasticsearch	3	3715	November 28, 2018
Time Zone Elasticsearch	1	223	July 6, 2017
When i use date_histogram it use utc time Elasticsearch	4	431	May 14, 2018
Time Zone Elasticsearch	6	1665	July 6, 2017

Filtering/aggregating by a timezone field in each document instead of the same timezone for all documents

Related topics