How to avoid sending duplicated log data to Redis


(Jin Park) #1

My filebeat version is 6.4.2
as I just mentioned, I want to avoid sending duplicated data to my Redis.
Assume I have a json file and it contains single json data as follow

{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "135569" } }

{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":["Action", "Adventure", "Sci-Fi"] }

after log file creation, I ran the filebeat and data is stored to my Redis without any issue.
and then, I inserted extra single json data to my log file and the log file has two json data as follow

{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "135569" } }

{ "id": "135569", "title" : "Star Trek Beyond", "year":2016 , "genre":["Action", "Adventure", "Sci-Fi"] },

{ "create" : { "_index" : "movies", "_type" : "movie", "_id" : "122886" } }

{ "id": "122886", "title" : "Star Wars: Episode VII - The Force Awakens", "year":2015 , "genre":["Action", "Adventure", "Fantasy", "Sci-Fi", "IMAX"] }

when the log file is saved, my filebeat excuted automatically and Redis has 3 data with duplicated value for _id 135569.

How can I avoid sending duplicated data to Redis?

Here is my filebeat.yml, it does not have any special option yet.

#=========================== Filebeat inputs =============================

filebeat.inputs:

- type: log
  enabled: true
  paths:
    - /Users//json_files/Log/*.log

...
...
#-------------------Redis Output ----------
output.redis:
   hosts: ["localhost"]
   key: "filebeat_test_log"
   db: 0
   timeout: 5


(Steffen Siering) #2

This sounds like you edit add additional lines in your editor. Filebeat keeps track of file identity by inode. Editors often do not edit files on disk, but they create a new file and replace the old file with the new file. Filebeat detects it is a new file and sends it again.