Timezone problem when output to Elasticsearch

kent010341 · August 17, 2022, 7:30am

I have multiple log files with the date on their file name, and I want to read them with the file input plugin and send them to an Elasticsearch index (the index name contains the date).

However, I have some logs being sent to the wrong index. For example, the log with timestamp 2022/08/10 is sent to the index log-2022.08.09.

I find that the timestamp of all the logs which is sent to the wrong index is between 00:00:00 to 08:00:00, which matches the timezone (UTC+8) I am in.

Therefore, I assume that it's because I miss some settings.
I wonder what is wrong to make this problem.

Thanks.

Here's my logstash.conf

input {
  file {
    path => ["/usr/share/logstash/proj_logs/karaf*.log"]
    start_position => "beginning"
    mode => "tail"
    codec => multiline {
      pattern => "^\D"
      what => "previous"
    }
  }
  file {
    path => ["/usr/share/logstash/proj_logs/karaf*.log.gz"]
    mode => "read"
    codec => multiline {
      pattern => "^\D"
      what => "previous"
    }
  }
}

filter {
  grok {
    match => {
      "message" => "(?<timestamp>^\S[^\|]*\S)\s*\|\s*(?<level>\S[^\|]*\S)\s*\|\s*(?<thread>\S[^\|]*\S)\s*\|\s*(?<logger>\S[^\|]*\S)\s*\|\s*(?<bundle>\S[^\|]*\S)\s*\|\s*(?<msg>\S*.*)"
    }
    remove_field => ["message"]
  }

  fingerprint {
    concatenate_sources => true
    source => ["timestamp", "msg"]
    method => "MD5"
  }

  date {
    match => [ "timestamp", "ISO8601" ]
    remove_field => ["timestamp"]
    timezone => "Asia/Taipei"
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "karaf-%{+YYYY.MM.dd}"
    document_id => "%{fingerprint}"
  }
}

Badger · August 17, 2022, 3:47pm

That sounds like it is working as expected. LogStash::Timestamp objects (and elasticsearch) are always stored as UTC. That means indexes roll over at midnight UTC, not midnight local time.

The entire stack is built on the assumption that dates are UTC. You can change the timezone on the date filter to force elasticsearch to store dates in your local timezone, but don't be surprised if some things break when you do that.

kent010341 · August 18, 2022, 1:42am

@Badger Thanks for the reply.

I remove the timezone config and all the logs are sent to the right place.

However, removing the timezone causes Kibana to show documents with the wrong timezone. In my case, all documents will have 8 hours more.

For now, I can only think of is to customize a field for the index name with a filter (such as a ruby filter) as a workaround.

I wonder if there's a better way to solve this problem.

Thanks.

Badger · August 18, 2022, 1:56am

By default Kibana will adjust the UTC dates from elasticsearch to be in the local timezone of the user's browser, although it can be configured to move them to any timezone you please.

kent010341 · August 18, 2022, 2:51am

@Badger Thanks for the explanation.

I know it's because Kibana adjusts the time according to the local timezone, but since my log is generated in the timezone UTC+8, I think removing the timezone config to use UTC time (default) and config Kibana to show timestamp in UTC time is just a workaround.

If there isn't any way to make the %{+YYYY.MM.dd} of the Elasticsearch output plugin use the local timestamp, I think I will write a ruby filter to have a custom field for the index name.

Badger · August 18, 2022, 3:10am

Why do you care which index the data is written to? If you are trying to make sure that queries for a particular day only get sent to one index then I think that is a micro-optimization that is rarely worth it. I believe date related queries for an index that contains nothing in the date range are very fast.

Kibana is normally configured to query a wildcard set of indexes. I never noticed that the size of that set significantly impacted the cost of the query. My experience was only comparing a single index with ~30 indexes for a month, but it didn't feel slower.

I know there could be use cases where a call to the elasticsearch REST API might make it easier to code the call to a single index, but with Kibana I think that is all handled for you.

@stephenb I know some folks like the idea that their personal day matches the index day. Is there any Elastic commentary / blog / documentation on this?

kent010341 · August 18, 2022, 3:30am

Currently, we have an ELK stack deployed with the OSS version, which doesn't have lifecycle management (if it's my misunderstanding, please let me know).

The reason why I have to care about which index the document is written to is that I want to purge the old index depending on its date. For now, I use the index name to check which index is too old that should be purged.

Also, one of our ELK stacks is deployed in a private network, and if I want to do a snapshot to bring one of the indices to another Elasticsearch instance, I have to make sure I choose the right index.

stephenb · August 18, 2022, 4:13am

@Badger I am not sure exactly what you are referring to...

BUT if you are asking if kibana / elasticsearch will only search the appropriate time series indices / shards the answer is Yes (or can be yes)

Certainly you can always search a specific indices to limit the scope

But with ILM (which is part of Free / Basic License) . When using ILM elastic will only search the appropriate shards... interestingly I was having a discussion on this very topic today.. so no a user does not need to know what index / shards they want to query with respect data "timescope/ time range" that is kept is in as the cluster state keeps track of that.

I would need to check to see if ordinary "Daily Indices" non ILM are treated that way.

Quick example of a 15 min query I ran today that has 90 Days worth or data ~.25 PB Data
Note elasticsearch knew to skip 2370 of the total 2382 Shards that did not fit the timerange filter... I did not tell it that.

{
  "id": "asdfkasdfkasdflkajshdflaskjdfhaslkdfjhasldfkjahsdlkasjdhfasldkfjh",
  "rawResponse": {
    "took": 705,
    "timed_out": false,
    "_shards": {
      "total": 2382,
      "successful": 2382,
      "skipped": 2370, <!---- YAY Skipped!
      "failed": 0
    },

@kent010341 Exactly which version of the ELK stack are you on? There has not been an OSS version since 7.10 ~Nov 2020... relatively ancient in Elastic terms.

kent010341 · August 18, 2022, 4:31am

The OSS version I used is:

Elasticsearch 7.10.2
Kibana 7.10.2
Logstash 7.12.0

system · August 18, 2022, 4:31am

Elasticsearch 7.10 is EOL and no longer supported. Please upgrade ASAP.

(This is an automated response from your friendly Elastic bot. Please report this post if you have any suggestions or concerns )

stephenb · August 18, 2022, 4:36am

ILM is X-Pack in 7.10

Yup so you can use daily indices like you are and you can limit the scope of the search manually if you like... I am not sure of the automatic skipping of shards will happen and daily indices and index patterns, I suspect not.

system · September 15, 2022, 4:36am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[ELK]logstash default timezone cause index splitting problem in different timezones Logstash	15	6355	July 6, 2017
Timezone causing problem when doing a search query to an index! Logstash	7	3648	July 6, 2017
Timezone Problem Logstash	4	1120	June 7, 2018
Questions about the timezone Logstash	5	816	April 29, 2017
Elasticsearch / logstash Log time shift Logstash	7	224	June 7, 2023

Timezone problem when output to Elasticsearch

Related topics