We have a logstash configuration for our logging application to show the logs on kibana dashboard. But we are having duplicate entries in elasticsearch. we did the workaround which is mentioned below blog.
Our configration is like
We have a application which generates 100-150 MB logs every day so we are streaming logs from application to rabbitmq first and after filtering we are putting the logs into elasticsearch using logstash configuration files.
Can you suggest something how to solve this issue?
We have a datacenter, where we have deployed java application which is generating application logs
We are gathering these logs into a particular location inside a box, where we have configured the ELK along with rabbitmq.
From the stored log location, logstash will push logs to rabbitmq queue and from there after filtering, logstash will push the logs into elasticsearch.
There is some issue with file plugin of Logstash, it is re-reading the lines in Application log. We only need it to read the delta changes in the file. As you can see in feeder configuration we are already using fingerprint plugin, still the issue persists. Please check the above configuration and let us know if there is something incorrectly used and suggest changes as necessary.
When it comes to feeder configuration we are reading input application log files from local filesystem(ext4-ubuntu) and for worker we are consuming from the rabbitmq queue.
If you are replacing a file, it will appear as a new file to Logstash as the node is different even if it has the same name. If the file you are replacing with contains old and new data, all this will be reprocessed. You should ideally add new data by appending data to the file rather than replacing it.
Hi,
I tested the entire configuration with 1 input file ( been generated by a jenkins job ), this file would be the input to logstash feeder configuration which will send it to rabbitmq and later logstash worker will receive these message from rabbitmq and send it to elastic search
This is what i observed -
The first time i run jenkins job it creates input file --> Logstash picks up contents from this file --> adds offset in since db and send it to rabbitmq --> Logstash worker picks up the same from rabbitmq --> sends it to Elasticsearch
Lets consider i have 3 logs for the first time i can see 3 logs in Kibana as well
When I run the jenkins job again, it adds lets say 2 more lines to the input file --> Logstash picks up contents from the input file --> updates offset in since db and send it to rabbitmq
but i see total 8 logs in Kibana now.
I can confirm that inode of the file is not changed after Jenkins job is run, sincedb offset is correct then how there are multiple entries in Elasticsearch ? I suspect Rabbitmq is re-delivering the already delivered messages again to elastic search ? Is this possible ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.