When using a file input, with elasticsearch output
how do I make sure that each of the lines in the log files I generate, and have logstash process only get added once?
If I add another output, like S3 some time later, will it update that output with the older data? Or will the data need to be re-loaded from scratch and have the since_db erased and will I need to clear the elasticsearch indexes?
how do I make sure that each of the lines in the log files I generate, and have logstash process only get added once?
Logstash tracks the inodes of files it has processed so unless you rotate the files by copying them to new files and have start_position => "beginning" set you should be fine.
If I add another output, like S3 some time later, will it update that output with the older data? Or will the data need to be re-loaded from scratch and have the since_db erased and will I need to clear the elasticsearch indexes?
When you add additional outputs only data processed by Logstash from then on will reach the new output. So yeah, in this case you need to clear sincedb and reprocess the files.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.