I'm using logstash file input to index various log files available in the input location. These logfiles are generated by log4j in various systems and are fetched using a cron job to the input location.
Is there any way I can use filebeats to fetch logs created by log4j in these systems? As I mentioned above, currently I am using a batch script cron job to fetch these from the user systems to the input location.
By default, log4j creates the backup of a log file after a size limit, so the logstash receives duplicate logs from time to time.
Is there any way to use an ID to avoid indexing duplicate data? Currently, I'm using custom document_id with a combination of @timestamp and ID field(see below my output filter). But, this seems to be overwriting the indexed data(correct me if I am wrong here). Instead, I would like to avoid indexing if it is a duplicate.
Yes, the above action option is avoiding the duplicate to index. But, since this is in output plugin, I can see the duplicates passing through all my filter plugin operations which is kind of redundant.
Is there any way to identify the duplicate before the filter plugin and avoid it?
@ptamba Yes, I can use filebeat agent and point the output to my Logstash instance.
But, in my case there are 100s of users. Do I need to manually install filebeat in each system. Is there any simpler way to do this?