I know this was posted before, only i never read a satifying answer\solution.
I was advised by my user succes manager to post the problem here
Using a windows10 environment (also tried on Linux)
I am using a simple configuration to read a log file with logbeat.
To start logstash i use the command .\bin\logstash -f .\config\sample.conf
Sample.conf:
input {
beats { port => 5044 }
}
filter {
grok {
match => [
"message", "%{TIMESTAMP_ISO8601:timestamp_string} %{SPACE}%{GREEDYDATA:line}"
]
}
mutate {
remove_field => [message, timestamp_string]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
}
stdout {
codec => rubydebug
}
}
I start filebeat with the command .\filebeat
Filebeat.yml:
filebeat.inputs:
What happenes is that the log file is being read and send over and over again wich will give a lot of duplicates. I found a way to avoid duplicates with the use of a fingerprint but that is not what i want.
I want the logfile only being updated by filebeat when a change happenes in the file and not being read all over again.
Also tried ignore_older: 5s, but it gave the same results.
In the registry file data.json offset is constantly set to 0
question:
Why are basic functions of filebeat not working (what am i missing) ?
The whole setup is on one machine including the log file.
I tried it both ways, copying the file and appending to a file with echo -n "text" >> /{path}/sample.log
Hi, Can you expand on how logstash bug contributed to the duplication? I very new to both filebeat and logstash and running the same very simple config like in this thread. I send file through and it completes all the events. I have file updated to include a couple new records and it results in all the records from the top of the file getting written again
Just had this exact same problem today. My server needed an restart after some updates so I manually stopped all the Elastic related services, restarted the machine, then brought all the services back up. For some reason it grabbed every log file and started indexing them again even though they all had been read in the past. What am I missing here? I thought the design of the registry was to take this into account and know that the files were already read and indexed.
2020-04-21T07:45:27.035-0600 INFO registrar/registrar.go:145 Loading registrar data from /usr/local/var/lib/filebeat/registry/filebeat/data.json
These files are copied from my raspberry pi once a day and then indexed so they are not changing over time.
To fix this I now have to delete the index for 2020, restore all the daily log files, and re-index everything from scratch.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.