Restore the sequence of the events

ginger · December 1, 2016, 1:52pm

Hi,

I use EKB to process log data, and FileBeat sends data to ElasticSearch directly.

The objective is to have some way to render the sequence of events in the same order they were read out of the log file.

With the "offset" field, I can do that inside a file. But in my case, all the "source" filed, which represents the file path, is same as log rotated. In other words, the lastest file is always called access.log.

Would appreciate any advice on how the problem might be overcome using the available options.

ginger · December 2, 2016, 1:38am

Would appreciate any suggestions, thanks.

ruflin · December 5, 2016, 3:27pm

If you first sort by @timestamp and then by offset you should get most of the time the right result. Most of the time for following reasons:

Do you use the @timestamp from filebeat itself or have your own processes timestamp? If you use the filebeat one, it can happen that the new file starts harvesting before the old one is finished and it would get out of order. So I recommend to process the timestamp of your log line
If you use the timestamp of your log line, it could happen that in case 2 log lines have the exact same timestamp and one is at the end of the file and the next one at the beginning of the file, order would not be correct. But I would be kind of surprised to see this.

Does this already help? Or do you log lines not have a timestamp?

ginger · December 6, 2016, 3:25am

Thanks very much! That can satify most of our need.

We don't have log lines not have a timestamp. Because the smallest unit of time is second and we have a lot of logs per second, we really have 2 log lines have the exact same timestamp in two different file.

I wonder if we can enrich event with inode or sequence number, so that we can restore the sequence of the events more accurately?

Any suggestions would be appreciated : )

ruflin · December 6, 2016, 8:01am

An other idea I had was using harvester_limit: 1 to make sure one log file is finished reading before the next one starts. The close_* variables would have to be adjusted to your use case to make sure the file is not close too early. Then you could use the timestamp generated by filebeat itself for the sorting and would not even need the offset.

It would not be too hard to add inode / device as additional event information. Then you sort by inode/device, then timestamp and then offset. That could work.

For the sequence number: Here the question is how this would work? It can happen that the first line of the new log file is read before the old one is finished. So we would have the same issue as with the offset. Filebeat itself does not know the 2 log files are related. Any thoughts on this?

ginger · December 6, 2016, 12:00pm

Thanks again!

We have two log lines have the same timestamp generated by filebeat, so maybe the first idea is not so good.

The second idea could work. We need to make some development work on FileBeat. Is there some configure I can achieve that?

For the sequence number, I agree with you.

ruflin · December 6, 2016, 4:01pm

Unfortuantely you are right for one. Even though the timestamp is in milli seconds if we have more then 1k events per sec we already have conflicts there Perhaps we need tiemstamps in nano seconds?

Currently there is no configuration for this. The easiest place for this would probably be here in the harvester where the event is generated: https://github.com/elastic/beats/blob/master/filebeat/harvester/log.go#L131 It could be added as meta information.

ginger · December 7, 2016, 2:47am

Thanks very much

It is quite easy to do that. To avoid inode reused, we can add a file id to the event temporarily. We are eagerly looking forward to that FileBeat will add similar features in the official version.

Maybe user may have two log lines with the same timestamp in nano seconds, though it's rare.

ruflin · December 8, 2016, 8:30am

In case you do some modifications to the code, it would be very nice if you could share it.

ginger · December 12, 2016, 11:45am

Thanks @ruflin so much.

system · December 22, 2016, 1:52pm

This topic was automatically closed after 21 days. New replies are no longer allowed.

Topic		Replies	Views
Filebeat \| Ordering Messages Beats	3	1347	October 31, 2016
How to control the filebeat read file sequence Beats filebeat	8	4376	July 5, 2017
Log Rotation with two events depending on each other Beats filebeat	11	861	December 13, 2016
Ensuring order for syslog events Beats filebeat	6	530	April 4, 2019
Documents order in Elasticsearch after processing with filebeat/logstash Logstash	5	1840	September 19, 2017

Restore the sequence of the events

Related topics