What does Filebeat do about timestamps? Especially when "backpressure" feature is working.
More generally, how do I get synchronized timestamps?
On machines, time may misalign and, moreover, it may change while Filebeat is working. Therefore, it becomes impossible to see "surrounding" logs. On other hand, there is definitely possible to add some time synchronization logic.
Filebeat uses by default the time the log line was read, so if the read is delayed for any reason this time won't be accurate. But in addition to that, filebeat modules use to include processors that replace this timestamp with the time stored in the log itself, that should be as accurate as the system monitored. In general it should be enough if all the monitored systems have a synchronized time, what can be achieved using NTP.
Times are sent in UTC by default, what should prevent problems with daylight saving time changes and so on.
What is your use case? do you need more accurate time synchronization?
Hi @jsoriano
I use VMs to test software and intentionally change system time on VMs as part of the tests. That's why I cannot rely on system time or use NTP. (Unusual, I know.)
Logstash/Elasticsearch servers have correct time.
Communication between VM and Logstash/Elasticsearch servers is quite quick.
Does that mean that the best I can do is to use timestamp added by Logstash?
Yes, then I guess that then the best you can do is to add a timestamp in logstash, but take into account that this will be the time when the log line was processed by logstash, that will surely contain some delay.
BTW, when file is read, is it done using poll/select with timeouts or file just checked on regular basis (every second)? In second case I'll get extra delay up to checking period...
Once haversting of a file has started, it is read by default as soon as there is available content, more in a select fashion. close_inactive can be used to close inactive files so there are not so many files open at the same time.
For new files there can be some delay, filebeat checks by default every 10 seconds for new files, this can be customized with scan_frequency setting.
Also, once the logs are read they are sent to logstash in batches, so there can be an additional delay till a batch is ready to be sent, there are some settings to control that, like bulk_max_size.
I'm currently implementing what we've discussed here above and got a problem. Logstash takes @timestamp field from Filebeat, and I couldn't find the way to make reset @timestamp field in event received from Filebeat. Is it possible to use time from Logstash machine with beats?
Take a look to this thread. As the events will probably have already a @timestamp logstash won't set it to its current time, but you can use the ruby scripting filter to do whatever you want.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.