I am running multiple LF instances on a single host - unfortunately I started them both from the same working directory so they are sharing a .logstash-forwarder file.
I presume this is incorrect and I will rectify it - but could this error give rise to duplicate events that I subsequently see in Elasticsearch ?
Hmm. That probably results in them periodically trampling on each other's .logstash-forwarder files, so if you restart both then one of them is going to lose its state. You should grab a copy of the file, shut down the instance that created the file, shut down the other one after you've made that it has flushed its internal state and overwritten the state file, then restart the LSF instances from different directories to which you've moved each instance's state file.
Thank you again for your reply. I have refactored my scripts to ensure that each LF instance now starts in a unique working directory.
BTW, I also found a workaround for this issue by using the logstash fingerprint filter to get a hash of the message and then using the fingerprint value as a document_id when submitting the document to elasticsearch. This ensures no exact duplicates make it into elasticsearch.
Hi Magnus,
Can you ellaborate on how to do so?
In my use-case, LSF is installed from rpm to /opt/logstash-forwarder and is started with init script (/etc/init.d/logstash-forwarder)
Yes, but I suspect they clash because they both use /var/lib/logstash-forwarder/.logstash-forwarder as a state file.
I understand that I should stop using LSF init script and run it from different directories? this means that the state file gets created in the directory I'm running the command from?
Yes, LSF creates the state file in the current directory. That doesn't mean that you need to stop using the init script, but you may have to modify it.
I've modified the init script to reflect the second LSF.
Also modified /etc/default/logstash-forwarder-OTHER file, configured 'chdir' to a different directory.
Should I expect to see the second LSF state file in 'chdir' directory?
I'm using lsof to determine which state file each LSF processes is using.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.