I'm using logstash file input to index various log files available in the input location. These logfiles are generated by log4j in various systems and are fetched using a cron job to the input location.
Is there any way I can use filebeats to fetch logs created by log4j in these systems? As I mentioned above, currently I am using a batch script cron job to fetch these from the user systems to the input location.
By default, log4j creates the backup of a log file after a size limit, so the logstash receives duplicate logs from time to time.
Is there any way to use an ID to avoid indexing duplicate data? Currently, I'm using custom document_id with a combination of @timestamp and ID field(see below my output filter). But, this seems to be overwriting the indexed data(correct me if I am wrong here). Instead, I would like to avoid indexing if it is a duplicate.
doc_as_upsert directives allows creation of new document if the document_id does not exist in ES. ES will overwrite (update) document if the same document_id exists.
@ptamba thanks for the reply.
For the second question, I don't ES to update the file. Instead, i want logstash to ignore the duplicate. Is there any way i can do that?
Yes, the above action option is avoiding the duplicate to index. But, since this is in output plugin, I can see the duplicates passing through all my filter plugin operations which is kind of redundant.
Is there any way to identify the duplicate before the filter plugin and avoid it?
@ptamba Yes, I can use filebeat agent and point the output to my Logstash instance.
But, in my case there are 100s of users. Do I need to manually install filebeat in each system. Is there any simpler way to do this?
Thank You
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.