I am trying to use logstash 5.5 for analyzing archived (.gz) files generating every minute. Each.gz file contains csv file in it. My .conf file looks like below:
input {
file {
type => "gzip"
path => [ "C:\data*.gz" ]
start_position => "beginning"
sincedb_path=> "gzip"
codec => gzip_lines
}
}
Initially I was getting error for missing gzip_lines plugin. So, I installed it. After installing this plugin, I can see that logstash says "Succesfully started Logstash API endpoint" but nothing get indexed. I do not see any indexing of data in elasticsearch in logstash logs. When I try to get the index in Kibana, it is not available there. It means that logstash is not putting data in elasticsearch.
May be I am using wrong configuration. Please suggest.
Following the different threads (https://www.elastic.co/guide/en/logstash/current/offline-plugins.html and Read a gzip file with gzip_lines codec), I built gzip_lines plugin on my Linux (offline) server. Also, I did not reference the .gz files directly from my logstash conf file but I kept them (their absolute paths) in a .txt file line by line. BUT, logstash still not working as expected, it reads that text file line by line and not parsing the .gz files mentioned in it. can any one can help on this?
Finally , got it working. But again one more question, gzip_lines is not performing the tail on the latest added .gz entry in the .txt file but it re-process all the files listed .txt files. Any specific reason for that. I want only latest .gz entry to be processed by gzip_lines logstash plugin. Any thoughts on this??
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.