Hi, I am very new to this but I am trying to create what I thought was a very simple filberts configuration. The file I am reading is not a "normal" log file with timestamps and formatted fields. The first few lines look like:
Reverse date time = 621170805
Sequence number = 2147483952
Component = SNSS1
Security event = FALSE
Event number = 4101
Event name = ARG Information
Event class = Software
Event severity = Info
My config is pretty simple:
- type: log
enabled: true
paths:
- D:\testdata*
Watching my log stash logs I see it skip the first few lines in the file every time... not always the same number of lines either. I threw together a test input file:
Line 1
Line 2
Line 3
Line 4
And logstash said:
Line 2
Line 1
Line3
I think what you are experiencing is correct, but not what you are expecting.
First, let me clarify how Filebeat works.
When Filebeat read a file, it tracks lines, for Filebeat a line must end with \n when you hit enter you are adding that invisible character when the line doesn't contain that character Filebeat is considering it as incomplete and wont attempt to read it. All Log libraries will always append that newline when a log statement is complete.
Since you are editing the file manually, this is why you never see the last line that because it doesn't contain the newline character, and for Filebeat that line is incomplete.
Ok. I'll buy that pending testing. It sounds reasonable. What about the out of sequence lines and the fact that some are missed? In my original file it was the first three or four lines that were always missed.
Watching my log stash logs I see it skip the first few lines in the file every time... not always the same number of lines either. I threw together a test input file:
It should read all the lines, from your examples I only see the last line skipped?
Are you always editing the same file on disk? Because we track the read offset on disk in a registry file (data/registry).
sequence lines
There are few things to take into consideration when considering sequence or order of events.
Filebeat is reading the file in order.
Filebeat is sending the events to Logstash in multiple batches.
Events are sent to a queue inside Logstash
Logstash by default is starting with multiple workers pickup up events from the queue. The number of worker default to the number of cores of the machine.
When you are at point 4, the ordering is not guaranteed due to the nature of asynchronous worker.
In some cases, you can achieve ordering by configuring Logstash only to have 1 worker (see pipeline.workers), However, this is drastically affecting performance, and I do not recommend it.
I do expect the ordering to remain intact. I will sanitize my original file and publish it. What I noticed first was that only the middle of the file was being transmitted (apparently). The first and last few lines were always missing.
This certainly solved the sequencing problem, although, as you say, there are potential performance implications. I can design around that for now. Thanks Pier.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.