Facet returns offset values even when UTC is set


(Brian Hicks) #1

Hi there. We're running Logstash to get logs into Elasticsearch, where
we query them and generate reports using facets. We're running into some
problems where we're seeing offset times returned from facets, when we
really just want UTC.

We have records stored from logstash in UTC, so:

curl -XPOST "http://our-es-host/analytics-2014.03.*/_search" -d'
{
     "size": 1
}'

returns something like this (I've elided irrelevant fields):

{
    "took": 108,
    "timed_out": false,
    "_shards": {
       "total": 62,
       "successful": 62,
       "failed": 0
    },
    "hits": {
       "total": 5890189,
       "max_score": 1,
       "hits": [
          {
             "_index": "analytics-2014.03.29",
             "_type": "analytics",
             "_id": "BHpbrZSDSiCZBlOW-pteOA",
             "_score": 1,
             "_source": {
                "@version": "1",
                "@timestamp": "2014-03-29T00:00:14.000+00:00",
                "params": {
                   "tag": [
                      "impression"
                   ]
                },
             }
          }
       ]
    }
}

So our reports basically look like this:

curl -XPOST "http://our-es-host/analytics-2014.03.*/_search" -d'
{
    "query": {
       "bool": {
          "must": [
             {
                "range": {
                   "@timestamp": {
                      "from": "2014-03-01",
                      "to": "2014-03-31"
                   }
                }
             },
             {
                "term": {
                   "params.tag": {
                      "value": "print"
                   }
                }
             }
          ]
       }
    },
    "facets": {
       "prints": {
          "date_histogram": {
             "key_field": "@timestamp",
             "interval": "day",
             "time_zone": "UTC",
             "pre_zone_adjust_large_interval": true
          }
       }
    },
    "size": 0
}'

As a note: I know the performance of the facets could be an issue, we
cache each day's result and generate final reports off of that.

So everything should be fine there, as far as I can tell. There should
be no time zone information other than UTC anywhere in that query. But
the results are returned (elided a bit again, for length):

{
    "took": 398,
    "timed_out": false,
    "_shards": {
       "total": 62,
       "successful": 62,
       "failed": 0
    },
    "hits": {
       "total": 669242,
       "max_score": 4.014433,
       "hits": []
    },
    "facets": {
       "prints": {
          "_type": "date_histogram",
          "entries": [
             {
                "time": 1395964800000,
                "count": 29527
             },
             {
                "time": 1396051200000,
                "count": 26806
             },
             {
                "time": 1396137600000,
                "count": 19802
             },
             {
                "time": 1396224000000,
                "count": 20382
             }
          ]
       }
    }
}

Converting those four records to human-readable strings yields the
following:

2014-03-27 19:00:00
2014-03-28 19:00:00
2014-03-29 19:00:00
2014-03-30 19:00:00

So that's not exactly what we want, the times should be at midnight.
That throws our reports off by a day. Normally we could just correct for
this by adding post_zone, and we've done that in the past, but the DST
break is throwing things off even more so I'm trying to find the root
cause for this so we can get the right data out of ES the first time.
The same thing happens if we don't specify a timezone, or exclude
pre_zone_adjust_large_interval. I could understand this behavior if I
was sending my locale, but curl isn't adding anything to the headers and
running the same command from our EC2 hosts (US-East-1 kept on UTC by
ntpd) results in the same response.

Is this something anyone has experienced before? Am I misunderstanding
the time_zone parameter of the facets?

Thanks,
Brian Hicks

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/D28088FD-1815-4DDC-9C9E-ED5E0FC9CD0F%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(brianhicks) #2

Does anyone have any insight into this? It's really killing our use of ES in production since we figured out it was doing this. Can we just turn off timezones in ES?


(system) #3