my daily index always create at 8 or 9AM daily which i believe is the UTC time ,
this means my index thor-2016.10.12, only contains the data from 8AM onwards, and thor-2016.10.11 contains data till to next day 8AM. This affect our data analysis.
is there any way i can set the daily index created at 12AM at my local time (UTC+8)? btw my server time is already local time.
yup we use logstash. is there any way this can be implemented in the next version of logstash cos this is basic requirement for time based data. for example additional configuration options in elasticsearch output plugin
maybe it doesn't matter if all the indices required for that time period exist, but it matters if you have a daily purging running behind. because your purge is for each indice not the actual date time, if i have below indices and my data retention is 2 days, means thor-2016.10.12 will be deleted. after it is deleted, the data i left is only everything from 10/13 8AM onward (my 12AM to 7AM data gone!). you understand what i mean? the indice doesn't match with the actual data volume inside.
generally i just dont understand why a query of 10/13 1AM to 11AM data need to hit both thor-2016.10.12 and thor-2016.10.13??if the YYYY.MM.DD doesnt represent what it should represent, what is the meaning to put it there?
That's because Elasticsearch expects all timestamps to be in UTC time. Logstash simply mirrors this, and that's why index rollover is at UTC. If Elasticsearch gets timestamps that are not in UTC (and do not have a proper offset affixed) it can and will cause problems, because other time data will be in UTC, and then there will be conflicts.
I understand the natural desire to have data compartmentalized into logical containers that make sense. There are ways Elasticsearch provides to help you get around this particular UTC vs. local time conundrum. In the future, you may not even want to use indices with a date-stamp in the name (though you're always free to do so if you choose). See the beauty of the new Rollover API in the Elasticsearch documentation to understand what I'm talking about. But that will require the field_stats API.
The field_stats API helps to isolate which indices have the data that you want to analyze. This is what Kibana does under the hood when you provide an index pattern. You select a time window in the picker, and it queries only those indices which have data in the time window. After using the field_stats API to select indices, you would just use a range filter for the timeframe you want.
If you have an index retention policy issue, Curator 4 has an age filter that uses the field_stats API. That way nothing gets purged before you want it to be.
As a matter of fact, you could theoretically use the rollover API to force rollover at midnight local time. This would be one way you could get indices to be completely compartmentalized into daily batches, but still have UTC data in them.
This would mean installing Elastic Stack 5.0, however, as the rollover API only becomes available in Elasticsearch 5.0.
The downside of this is that they wouldn't have a date stamp in the name. They'd be index-0001, index-0002, index-0003, etc., so you'd still need the field_stats API to identify which index had which data.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.