Why is filebeat reading log files over and over again

Hello all

I know this was posted before, only i never read a satifying answer\solution.
I was advised by my user succes manager to post the problem here

Using a windows10 environment (also tried on Linux)
I am using a simple configuration to read a log file with logbeat.
To start logstash i use the command .\bin\logstash -f .\config\sample.conf
Sample.conf:
input {
beats { port => 5044 }
}
filter {
grok {
match => [
"message", "%{TIMESTAMP_ISO8601:timestamp_string} %{SPACE}%{GREEDYDATA:line}"
]
}
mutate {
remove_field => [message, timestamp_string]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
}
stdout {
codec => rubydebug
}
}

I start filebeat with the command .\filebeat
Filebeat.yml:
filebeat.inputs:

  • type: log
    enabled: true
    paths:
    • ./sample.log
      output.logstash:
      hosts: ["localhost:5044"]

Sample.log contains 14 records

What happenes is that the log file is being read and send over and over again wich will give a lot of duplicates. I found a way to avoid duplicates with the use of a fingerprint but that is not what i want.
I want the logfile only being updated by filebeat when a change happenes in the file and not being read all over again.
Also tried ignore_older: 5s, but it gave the same results.
In the registry file data.json offset is constantly set to 0

question:
Why are basic functions of filebeat not working (what am i missing) ?

Is it reading the log from a local file or a network drive? Is the file bring appended to or copied into place?

The whole setup is on one machine including the log file.
I tried it both ways, copying the file and appending to a file with echo -n "text" >> /{path}/sample.log

Solved.
Its not filebeat itself causing the problem but logstash

Cause:
Buggy logstash 7.5.2

Solution:
Replace logstash 7.5.2 with 7.5.1 or 7.6.1
Or fully upgrade to 7.6.1

Hi, Can you expand on how logstash bug contributed to the duplication? I very new to both filebeat and logstash and running the same very simple config like in this thread. I send file through and it completes all the events. I have file updated to include a couple new records and it results in all the records from the top of the file getting written again

Just had this exact same problem today. My server needed an restart after some updates so I manually stopped all the Elastic related services, restarted the machine, then brought all the services back up. For some reason it grabbed every log file and started indexing them again even though they all had been read in the past. What am I missing here? I thought the design of the registry was to take this into account and know that the files were already read and indexed.

2020-04-21T07:45:27.035-0600 INFO registrar/registrar.go:145 Loading registrar data from /usr/local/var/lib/filebeat/registry/filebeat/data.json

These files are copied from my raspberry pi once a day and then indexed so they are not changing over time.

To fix this I now have to delete the index for 2020, restore all the daily log files, and re-index everything from scratch.

I don't no exactly how logstash is related to the problem
i was using version 7.5.2
then I tried some combi's with versions leaving me 1 conclusion

elastic 7.6.1 kibana 7.6.1 ; logstash 7.6.1 filebeat 7.5.2 no duplication problm
elastic 7.6.1 kibana 7.6.1 ; logstash 7.5.2 filebeat 7.6.1 duplication problem
elastic 7.6.1 kibana 7.6.1 ; logstash en filebeat 7.5.2 duplication problem
elastic 7.5.2 kibana 7.5.2 ; logstash en filebeat 7.6.1 no duplication problem
elastic 7.6.1 kibana 7.6.1 ; logstash en filebeat 7.5.1 no duplication problem
elastic 7.5.1 kibana 7.5.1 ; logstash en filebeat 7.6.1 no duplication problem

so try replacing logstash with another version and find out if the problem still exists

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.