Using Logstash 5.5 for reading .gz files


(shashwat) #1

Hi,

I am trying to use logstash 5.5 for analyzing archived (.gz) files generating every minute. Each.gz file contains csv file in it. My .conf file looks like below:
input {
file {
type => "gzip"
path => [ "C:\data*.gz" ]
start_position => "beginning"
sincedb_path=> "gzip"
codec => gzip_lines
}
}

filter {
csv {
separator => ","
columns => ["COL1","COL2","COL3","COL4","COL5","COL6","COL7"]
}

}

output {
elasticsearch {
hosts => "localhost:9200"
index => "mydata"
document_type => "zipdata"
}
stdout {}
}

Initially I was getting error for missing gzip_lines plugin. So, I installed it. After installing this plugin, I can see that logstash says "Succesfully started Logstash API endpoint" but nothing get indexed. I do not see any indexing of data in elasticsearch in logstash logs. When I try to get the index in Kibana, it is not available there. It means that logstash is not putting data in elasticsearch.

May be I am using wrong configuration. Please suggest.

Any help appreciated.


(shashwat) #3

Found one issue with my configuration. I was using wrong "slash" to read files in the path attribute. After correcting it , I can see logstash start reading each of the .gz file but it shows some error in reading . Pasting logs here :
observe_read_file: general error reading C:/data1.csv.gz - error: java.lang.IllegalArgumentException: Object: ?Gq?Y data1.csv ?ýI²u;®&ösÀy×XM?)²?2#ô?Y¦eGó?E. ðãvÏÛ8¿ûKD?ÿüßÿ·ÿöÿÿûÿó¿ÿßÿûÿö÷¿ü×ÿçÿþÿïÿëÿöü¿Âßÿòýçÿ1~ÿ|ÿïÿüýoãÿ?Âÿãuã¿Zc¾ÿÿþþ#ºÚß_ÿóý?ÿ^ê¯ Þþ|ÐòOþþæÊ?m´Yÿ-û_?ÿø¿Ùêeé褺?ßFïÿ?â=Dõ¡Îsø'ýýýÏÿñ?ÿý#¦5ÃøÍáþ¼þ?0 kv%xeßç9ø×ý¤ôrmÖ?GËosþá?)ÀæþB?³I-®ù??×øe¬{w~4?GÐÖ¾vÞÂã_p«?±Î±?ºey=Á?¾_æRøÚ¬¿?ãh?{íóèÚ£Ï2fö²íæ¶:50z? FË?mìûµ¬>àe-Ëß]wG??0¯û?ðÿt°¯=??htë?Îq?{?dÿ ®n¿Þÿë¿ýç?À<°©,p¸?±??Àã6-LaY¿¿kYóïÑ®ñ(uu7ÉyBéo?~?c\þV¸FÃ8?\ä],ãr?]N.ñíñ÷åK.?ÏÇn?<fr}^?ÏËýy¥»Á &×


(shashwat) #4

Following the different threads (https://www.elastic.co/guide/en/logstash/current/offline-plugins.html and Read a gzip file with gzip_lines codec), I built gzip_lines plugin on my Linux (offline) server. Also, I did not reference the .gz files directly from my logstash conf file but I kept them (their absolute paths) in a .txt file line by line. BUT, logstash still not working as expected, it reads that text file line by line and not parsing the .gz files mentioned in it. can any one can help on this?


(shashwat) #5

Finally , got it working. But again one more question, gzip_lines is not performing the tail on the latest added .gz entry in the .txt file but it re-process all the files listed .txt files. Any specific reason for that. I want only latest .gz entry to be processed by gzip_lines logstash plugin. Any thoughts on this??


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.