Restore the sequence of the events

Hi,

I use EKB to process log data, and FileBeat sends data to ElasticSearch directly.

The objective is to have some way to render the sequence of events in the same order they were read out of the log file.

With the "offset" field, I can do that inside a file. But in my case, all the "source" filed, which represents the file path, is same as log rotated. In other words, the lastest file is always called access.log.

Would appreciate any advice on how the problem might be overcome using the available options.

Would appreciate any suggestions, thanks.

If you first sort by @timestamp and then by offset you should get most of the time the right result. Most of the time for following reasons:

  • Do you use the @timestamp from filebeat itself or have your own processes timestamp? If you use the filebeat one, it can happen that the new file starts harvesting before the old one is finished and it would get out of order. So I recommend to process the timestamp of your log line
  • If you use the timestamp of your log line, it could happen that in case 2 log lines have the exact same timestamp and one is at the end of the file and the next one at the beginning of the file, order would not be correct. But I would be kind of surprised to see this.

Does this already help? Or do you log lines not have a timestamp?

Thanks very much! That can satify most of our need.

We don't have log lines not have a timestamp. Because the smallest unit of time is second and we have a lot of logs per second, we really have 2 log lines have the exact same timestamp in two different file.

I wonder if we can enrich event with inode or sequence number, so that we can restore the sequence of the events more accurately?

Any suggestions would be appreciated : )

An other idea I had was using harvester_limit: 1 to make sure one log file is finished reading before the next one starts. The close_* variables would have to be adjusted to your use case to make sure the file is not close too early. Then you could use the timestamp generated by filebeat itself for the sorting and would not even need the offset.

It would not be too hard to add inode / device as additional event information. Then you sort by inode/device, then timestamp and then offset. That could work.

For the sequence number: Here the question is how this would work? It can happen that the first line of the new log file is read before the old one is finished. So we would have the same issue as with the offset. Filebeat itself does not know the 2 log files are related. Any thoughts on this?

Thanks again! :grin:

We have two log lines have the same timestamp generated by filebeat, so maybe the first idea is not so good.

The second idea could work. We need to make some development work on FileBeat. Is there some configure I can achieve that?

For the sequence number, I agree with you.

Unfortuantely you are right for one. Even though the timestamp is in milli seconds if we have more then 1k events per sec we already have conflicts there :frowning: Perhaps we need tiemstamps in nano seconds? :smiley:

Currently there is no configuration for this. The easiest place for this would probably be here in the harvester where the event is generated: https://github.com/elastic/beats/blob/master/filebeat/harvester/log.go#L131 It could be added as meta information.

Thanks very much :grin:

It is quite easy to do that. To avoid inode reused, we can add a file id to the event temporarily. We are eagerly looking forward to that FileBeat will add similar features in the official version.

Maybe user may have two log lines with the same timestamp in nano seconds, though it's rare.

In case you do some modifications to the code, it would be very nice if you could share it.

Thanks @ruflin so much.

This topic was automatically closed after 21 days. New replies are no longer allowed.