Filebeats records skipped and out of order

Hi, I am very new to this but I am trying to create what I thought was a very simple filberts configuration. The file I am reading is not a "normal" log file with timestamps and formatted fields. The first few lines look like:
Reverse date time = 621170805
Sequence number = 2147483952
Component = SNSS1
Security event = FALSE
Event number = 4101
Event name = ARG Information
Event class = Software
Event severity = Info

My config is pretty simple:
- type: log
enabled: true
paths:
- D:\testdata*

Watching my log stash logs I see it skip the first few lines in the file every time... not always the same number of lines either. I threw together a test input file:
Line 1
Line 2
Line 3
Line 4
And logstash said:
Line 2
Line 1
Line3

So I added to the file:
Line 5
Line 6

And got in logstash:
Line 4
Line 5

I am very confused about what is going on here.

I think what you are experiencing is correct, but not what you are expecting.

First, let me clarify how Filebeat works.

When Filebeat read a file, it tracks lines, for Filebeat a line must end with \n when you hit enter you are adding that invisible character when the line doesn't contain that character Filebeat is considering it as incomplete and wont attempt to read it. All Log libraries will always append that newline when a log statement is complete.

Since you are editing the file manually, this is why you never see the last line that because it doesn't contain the newline character, and for Filebeat that line is incomplete.

Ok. I'll buy that pending testing. It sounds reasonable. What about the out of sequence lines and the fact that some are missed? In my original file it was the first three or four lines that were always missed.

Watching my log stash logs I see it skip the first few lines in the file every time... not always the same number of lines either. I threw together a test input file:

It should read all the lines, from your examples I only see the last line skipped?
Are you always editing the same file on disk? Because we track the read offset on disk in a registry file (data/registry).

sequence lines

There are few things to take into consideration when considering sequence or order of events.

  1. Filebeat is reading the file in order.
  2. Filebeat is sending the events to Logstash in multiple batches.
  3. Events are sent to a queue inside Logstash
  4. Logstash by default is starting with multiple workers pickup up events from the queue. The number of worker default to the number of cores of the machine.

When you are at point 4, the ordering is not guaranteed due to the nature of asynchronous worker.
In some cases, you can achieve ordering by configuring Logstash only to have 1 worker (see pipeline.workers), However, this is drastically affecting performance, and I do not recommend it.

What is your use case to require ordering?

I do expect the ordering to remain intact. I will sanitize my original file and publish it. What I noticed first was that only the middle of the file was being transmitted (apparently). The first and last few lines were always missing.

In that case configuring logstash worker to 1, should work but will make things slower and increasing the risk of blocking the pipeline.

We don't have nanosecond precision yet, if we have maybe using the generated timestamp could solve your case.

Ah! That makes perfect sense. Thanks Pier.

This certainly solved the sequencing problem, although, as you say, there are potential performance implications. I can design around that for now. Thanks Pier.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.