I originally posted this over on github, where I was told this behavior was not a bug and was encouraged to seek help here instead.
I am using AWS ElasticSearch Service, version 6.2.
I am trying to get the hour of the day of a date field.
The data sent to the date field includes timezone/offset information, I would like this to be accounted for. (Example: The hour of the day of 2018-06-29T19:31:30-03:00 would be 19)
I would like to use painless, as all other languages are turned off.
Example date-time information I am putting into the lts date field:
The results I am getting:
0 hour has 2 doc count
22 hour has 1 doc count
The results I am expecting:
2 hour has 1 doc count
8 hour has 1 doc count
19 hour has 1 doc count
Is there a way to get what I want? (ie: localized hour-of-day)
I have read elsewhere that I may be able to parse a zoned-date-time using painless scripting, however that requires a string/character-array. Is there a way in painless to access the document's original source, in order to re-parse it?
It would be much more efficient to compute that at index time instead.
Then, if you don't care about the TZ, may be just omit it when writing your documents. Elasticsearch will consider that all dates are in the same TZ (UTC).
Cloud by elastic is one way to have access to all features, all managed by us. Think about what is there yet like Security, Monitoring, Reporting, SQL and what is coming like Canvas...
Are you saying to compute and store the localized hour-of-day at index time? Is there a way for ElasticSearch to compute this and create it as a secondary field or something, at index time? Do you have an example of this I could see?
At any rate, hour-of-day was only an example. We also want day-of-week and some other calculations as well. With how much storage is costing us, I'd prefer to just store my date-time as a single timezone-aware datatype, and be able to compute everything later at query/aggregation time. Unfortunately ElasticSearch doesn't have a timezone-aware date datatype, which seems like a major omission...
I do understand that the date field type stores everything as UTC time. (Though I did not realize that when originally creating the template/mapping and my data.) However, that is the datatype my date data is stored under right now, so I'd like to know if there is a way to get what I need now...
Can the original source/text be accessed in a script, so that I can reparse it as a timezone aware datatype in the script?
Yes. That's what I meant. It's always preferable to do expensive computations at index time rather than at search time. In your case, you need to visit all the documents from the resultset to compute buckets. It's a way much faster if this information is already stored in the document.
Is there a way for Elasticsearch to compute this and create it as a secondary field or something, at index time?
At any rate, hour-of-day was only an example. We also want day-of-week and some other calculations as well.
If you know from he beginning what you are going to compute, then do that at index time.
With how much storage is costing us, I'd prefer to just store my date-time as a single timezone-aware datatype, and be able to compute everything later at query/aggregation time.
That's fine if you are ok with slower response time.
Unfortunately Elasticsearch doesn't have a timezone-aware date datatype
Hmmm. Not sure I'm following. I mean that when you specify a timezone, elasticsearch is taking it into account. If you don't, elasticsearch supposes it's UTC time.
You can at index time have both versions with ingest again. Look at Grok Processor | Elasticsearch Guide [6.4] | Elastic.
@veqryn If I understand your original problem correctly, you want to return the hour of the day in whatever timezone the time was originally specified in? I would do this by storing the timezone in a separate field. You can then adjust the UTC time pulled from the field in your script by looking up your timezone field. You could also extract the timezone field from your date field in an ingest script.
@dadoonet: I'm not sure what your example of "date and utc date" would accomplish, if any time I try to pull the data out it comes out in UTC time.
@rjernst: Yes, you are correct.
I have already started storing the timezone data as a separate field, and that solves the problem for new indexes.
However, I'd still like to get an answer to my original question, so that I can deal with the data that I already have in indexes that are already created.
Is there a way to see the original date string from a script?
Is there any way for me get the hour-of-day on data that was input in RFC3339, in the timezone/location of that datetime?
It depends on what type of script you are using. Eg if you are just using return fields, then sure, the _source is available. You could parse this, instead of using doc values. I would recommend against this, though, if you are doing this in an aggregation. If you just need to update older docs to include your new fields, then I would try using reindex with a script (again, you will have access to _source).
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.