It would be great if someone shared their best practices on storing dates in ES in general but I have a specific question too.
We mostly use dates without time and map them as type "date" and format "date" in ES. and we send them to ES in short form "2016-07-25" which from our perspective is in local time. Will such short format be converted to UTC and thus the date may change in ES and aggregation on date may produce unintended results?
I suspect queries will work because ES would convert date in a query in the same way but aggregations would be affected.
Should we be sending dates to elastic in full UTC even though we only need date part? That will also mean passing full UTC dates to queries?
I would appreciate if you share your handling of dates in similar scenarios
Internally, when you map any date as a date type, it will be converted to an epoch+milliseconds timestamp. Internally. This is how Elasticsearch is able to perform range queries against dates. But in the _source, your date will remain in the string representation you submitted.
Thank you Aaron. No way to configure elastic to use current locale?
We can certainly send full UTC or ISO dates to ES but it is much harder to ensure that all applications performing queries and aggregations via REST also use full UTC format. The the data as well as consumer apps have single time zone mentality and are not used to using full UTC for dates
In order for aggregations and queries to work properly, things will have to be in UTC, regardless of whether they are in ISO8601 or year-month-day format. This is, as mentioned previously, because internally the date will be converted and stored in the Lucene data structure as epoch+milliseconds. If the date is improperly stored (e.g. the date represents local time instead of UTC) then queries and aggregations will be incorrect. I'm not sure how you will need to address this on your end, but something will need to be done to ensure UTC for consistency.
Logstash can normalize dates, if that is an option for your use case.
no logstash is not an option we have very large and complex documents assembled from several large relational databases sent to ES as data in them changes
I was hoping that ES could be told to normalize dates when parsing JSON.
Say you specify an index (or a whole cluster) to be in certain time zone and ES would convert non fully qualified dates to UTC based on specified time zone rather than making assumption they are already in UTC
ES stores stuff as UTC, there is no way around that.
The best option would be to pass in the TZ with the timestamp, which ES will then use when converting to epoch.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.