Filtering/aggregating by a timezone field in each document instead of the same timezone for all documents

I'm fairly new to Elasticsearch so I'm not 100% sure it's appropriate for our use case, so let me know if it isn't.

We are performing some basic analytics for exposure time from specific locations. I want to be able to filter and aggregate these based on the local time (i.e. total exposure time at 9am, 10am, etc.). Each location can have different timezones and, importantly, some time zones have half-hour offsets (specifically Adelaide) - meaning I can't easily convert the timezones later on our application as histogram intervals in half-hour timezones would run from 9:30am - 10:30am for example. (I hope that makes sense but happy to clarify)

My question is, can I somehow 'join' in a field from each document and use that in the date_histogram time_zone aggregation. For example, some abridged documents look like:

POST delivery-abcd-1234/_doc/
{
  "date": "2019-07-30T22:39:52.678Z",
  "exposureTime": 4989,
  "siteID": "TestSite1",
  "timezone": "Australia/Adelaide"
}

POST delivery-abcd-1234/_doc/
{
  "date": "2019-07-30T22:42:16.876Z",
  "exposureTime": 5012,
  "siteID": "TestSite2",
  "timezone": "Australia/Sydney"
}

I then aggregate them like below. I want to somehow replace the constant "Australia/Adelaide" with some reference to the timezone in each document. I would also like to use a range filter in a similar way if possible.

POST delivery-abcd-1234/_doc/_search?size=0
{
  "aggregations": {
    "by_interval": {
      "date_histogram" : {
        "field" : "date",
        "interval" : "hour",
        "time_zone": "Australia/Adelaide"
      },
      "aggregations": {
        "total_exposure": {
          "sum": { "field": "exposureTime" }
        }
      }
    }
  }
}

Is something like this possible/wise? Are there any good alternative ways? I was wondering if perhaps having an index per timezone would be a better approach (although ideally, I wouldn't denormalise the timezone into the document, instead join it from another index).

I would store the original dates with the timezone included. Date fields are time zone aware, and will be stored internally as whatever the UTC equivalent is. But right now that information is being "lost" because you're storing the dates as UTC ("Z") and keeping the timezone in a separate field.

Available date patterns are here, and you can specify entirely custom patterns using java-time format. In this case it looks like you'd probably want V for the official time-zone IDs like Australia/Adelaide

Then when you aggregate, the aggregation is working on a normalized UTC version. The time_zone parameter is used to shift around the buckets so they make sense to whoever is viewing the results, rather than trying to munge the data from different timezones into one zone.

Keeping the separate timezone field is fine too, it's sometimes useful to do a search like "find all documents that came from Australia/Sydney", but I would also include it in the timestamp itself.

There are probably ways to do what you asked, in particular you can use a script on the date_histogram to munge up the returned value... but it sounds necessary in this case (I think)

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.