Timezone Problem


(Paulo Montanha) #1

Hi everyone,

I need to send data between elasticsearch(input) and s3 bucket(output) - not so complicated.

We are using this configuration in logstash:

Scenario: logstash file used to export data from elasticsearch and output information in a s3 bucket.

logstash.conf

input {

elasticsearch {
  index => "servers"
  hosts => [ "myhostinformation.test:443" ]
  ssl => true
  query => '{"query":{"range": {"@timestamp":{"gt":"04/02/2018 00:00:00.000","lt": "04/02/2018 23:59:59.999","format": "dd/MM/yyyy HH:mm:ss.SSS||dd/MM/yyyy HH:mm:ss.SSS","time_zone": "-02:00"}}}}'
  size => 500
  scroll => "5m" }

}

output {

 s3 {
  region => "sa-east-1"
  time_file => 1
  bucket => "mybucket.com"
  prefix => "%{+YYYY}/%{+MM}/%{+dd}"
  codec => json
}

}

This is another logstash file used to get files inside a S3 bucket and put information in a elasticsearch index:

input {

s3 {
  bucket => "mybucket.com"
  prefix => "PREFIX_BUCKET"
  region => "sa-east-1"
  add_field => {
        inf => "s3"
    }
}

}

filter
{

  mutate { gsub => ["message" , "}{", "}|\n|{"] }
  split { terminator => "|\n|" }
  json {
  source => "message"  }
  json {
      source => "raw_message"
  }

}

output {

elasticsearch {
  index => "historical-information"
  hosts => [ "myhostinformation.test:443" ]
}

}

===================
Problem: For some reason the data between 22:00:00 and 23:59:59 does not there as I expected and obviously number of documents inside this new bucket is not the correct one.

Important Informations:

  • We tried to remove time_zone information in query input elasticsearch, but, it does not work
  • We tried to change the lt to 05/02/2018 02:00:00, but, it does not work
  • We created this logstash.conf file - migrating data between 2 different index and everything works well:

input {

elasticsearch {
  index => "servers"
  hosts => [ "myhostinformation.test:443" ]
  ssl => true
  query => '{"query":{"range": {"@timestamp":{"gt":"04/02/2018 00:00:00.000","lt": "04/02/2018 23:59:59.999","format": "dd/MM/yyyy HH:mm:ss.SSS||dd/MM/yyyy HH:mm:ss.SSS","time_zone": "-02:00"}}}}'
  size => 500
  scroll => "5m"

}
}

output {

elasticsearch {
  index => "historical-information"
  hosts => [ "myhostinformation.test:443" ]
}

}

====

  • LOGSTASH VERSION = 5.6.9
  • ELASTICSEARCH VERSION = 5.3
    ===

When we execute a query directly in elasticsearch:

Correct Information:

GET servers/_search

"hits": {
"total": 720817

{
"query": {
"range": {
"@timestamp": {
"gt": "04/02/2018 00:00:00.000",
"lt": "04/02/2018 23:59:59.999",
"format": "dd/MM/yyyy HH:mm:ss.SSS||dd/MM/yyyy HH:mm:ss.SSS",
"time_zone": "-02:00"
}
}
}
}

Incorrect Data - New Index called historical-information:

GET historical-information/_search

"hits": {
"total": 639783

Now we don't know if we are going to the right way or this is a limitation. Please help us!


(Ry Biesemeyer) #2

What timezone is the machine on which Logstash is running? Is it possible that the format string that determines the bucket prefix is putting items into the following day's bucket once the machine-local day rolls over?


(Ry Biesemeyer) #3

Unrelated, but your queries may be losing items at the milisecond-level boundaries and should use "lte".


(Paulo Montanha) #4

we found the solution .. in this part we put information of prefix with date format and for some reason with the time_zone -02:00 the date is next day.

in order to solve that problem we make this change:

before:

after:

After this all information was there.

Thank you very much


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.