i am currently building up an application that reads log file regularly from a location and using logstash, and indexing to the elastic search. There are lots of file and we can expect daily feeds to the source.
Due to some internal maintenance, if the logstash shuts down in the middle, when we restart the logstash , it is not starting from the place where it left. i will have to end up clearing all the .sincedb files and restart the process, which i feel it is inefficient.
i tried to use rabbitMQ, but here also i may face the same issue.
i currently built a process to write the logstash processed files to a location from logstash output. I created a new batch process that query each file against elastic search and ensure that if the indexing is done, the file will be deleted. but that will also not help much in case if the file was partially indexed.
Due to some internal maintenance, if the logstash shuts down in the middle, when we restart the logstash , it is not starting from the place where it left. i will have to end up clearing all the .sincedb files and restart the process, which i feel it is inefficient.
Where does it start after the restart? What's in the sincedb files when that happens? Are the input files being rotated while Logstash is down?
i tried to use rabbitMQ, but here also i may face the same issue.
after processing lots of data , if the logstash shut down and when i restart, it is not even starting. am not sure if it still calculating the previously processed files and comparing with the source folder.
i might probably have to look at it when it happens next time. i saw some blog reply by you to read the .sincedb files.
No, the files are not rotated. we get continuous feed to the same source folders.[quote="magnusbaeck, post:2, topic:56917"]
i tried to use rabbitMQ, but here also i may face the same issue.
[/quote]
the reason i think that it may not work is below.
i created two logstash process. one to read the file and write to the rabbitMQ and other to read rabbit MQ and index to Elasticsearch.
the problem i anticipate is, if the logstash shut down in the middle in process 1, again there will be no track. i may have to repeat the same process which i followed with out rabbitMQ.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.