I have multiple log files with the date on their file name, and I want to read them with the file input plugin and send them to an Elasticsearch index (the index name contains the date).
However, I have some logs being sent to the wrong index. For example, the log with timestamp 2022/08/10 is sent to the index log-2022.08.09.
I find that the timestamp of all the logs which is sent to the wrong index is between 00:00:00 to 08:00:00, which matches the timezone (UTC+8) I am in.
Therefore, I assume that it's because I miss some settings.
I wonder what is wrong to make this problem.
That sounds like it is working as expected. LogStash::Timestamp objects (and elasticsearch) are always stored as UTC. That means indexes roll over at midnight UTC, not midnight local time.
The entire stack is built on the assumption that dates are UTC. You can change the timezone on the date filter to force elasticsearch to store dates in your local timezone, but don't be surprised if some things break when you do that.
By default Kibana will adjust the UTC dates from elasticsearch to be in the local timezone of the user's browser, although it can be configured to move them to any timezone you please.
I know it's because Kibana adjusts the time according to the local timezone, but since my log is generated in the timezone UTC+8, I think removing the timezone config to use UTC time (default) and config Kibana to show timestamp in UTC time is just a workaround.
If there isn't any way to make the %{+YYYY.MM.dd} of the Elasticsearch output plugin use the local timestamp, I think I will write a ruby filter to have a custom field for the index name.
Why do you care which index the data is written to? If you are trying to make sure that queries for a particular day only get sent to one index then I think that is a micro-optimization that is rarely worth it. I believe date related queries for an index that contains nothing in the date range are very fast.
Kibana is normally configured to query a wildcard set of indexes. I never noticed that the size of that set significantly impacted the cost of the query. My experience was only comparing a single index with ~30 indexes for a month, but it didn't feel slower.
I know there could be use cases where a call to the elasticsearch REST API might make it easier to code the call to a single index, but with Kibana I think that is all handled for you.
@stephenb I know some folks like the idea that their personal day matches the index day. Is there any Elastic commentary / blog / documentation on this?
Currently, we have an ELK stack deployed with the OSS version, which doesn't have lifecycle management (if it's my misunderstanding, please let me know).
The reason why I have to care about which index the document is written to is that I want to purge the old index depending on its date. For now, I use the index name to check which index is too old that should be purged.
Also, one of our ELK stacks is deployed in a private network, and if I want to do a snapshot to bring one of the indices to another Elasticsearch instance, I have to make sure I choose the right index.
@Badger I am not sure exactly what you are referring to...
BUT if you are asking if kibana / elasticsearch will only search the appropriate time series indices / shards the answer is Yes (or can be yes)
Certainly you can always search a specific indices to limit the scope
But with ILM (which is part of Free / Basic License) . When using ILM elastic will only search the appropriate shards... interestingly I was having a discussion on this very topic today.. so no a user does not need to know what index / shards they want to query with respect data "timescope/ time range" that is kept is in as the cluster state keeps track of that.
I would need to check to see if ordinary "Daily Indices" non ILM are treated that way.
Quick example of a 15 min query I ran today that has 90 Days worth or data ~.25 PB Data
Note elasticsearch knew to skip 2370 of the total 2382 Shards that did not fit the timerange filter... I did not tell it that.
@kent010341 Exactly which version of the ELK stack are you on? There has not been an OSS version since 7.10 ~Nov 2020... relatively ancient in Elastic terms.
Yup so you can use daily indices like you are and you can limit the scope of the search manually if you like... I am not sure of the automatic skipping of shards will happen and daily indices and index patterns, I suspect not.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.