Using Logstash 5.5 for reading .gz files

shashwat · August 22, 2017, 4:59pm

Hi,

I am trying to use logstash 5.5 for analyzing archived (.gz) files generating every minute. Each.gz file contains csv file in it. My .conf file looks like below:
input {
file {
type => "gzip"
path => [ "C:\data*.gz" ]
start_position => "beginning"
sincedb_path=> "gzip"
codec => gzip_lines
}
}

filter {
csv {
separator => ","
columns => ["COL1","COL2","COL3","COL4","COL5","COL6","COL7"]
}

}

output {
elasticsearch {
hosts => "localhost:9200"
index => "mydata"
document_type => "zipdata"
}
stdout {}
}

Initially I was getting error for missing gzip_lines plugin. So, I installed it. After installing this plugin, I can see that logstash says "Succesfully started Logstash API endpoint" but nothing get indexed. I do not see any indexing of data in elasticsearch in logstash logs. When I try to get the index in Kibana, it is not available there. It means that logstash is not putting data in elasticsearch.

May be I am using wrong configuration. Please suggest.

Any help appreciated.

shashwat · August 25, 2017, 6:08am

Found one issue with my configuration. I was using wrong "slash" to read files in the path attribute. After correcting it , I can see logstash start reading each of the .gz file but it shows some error in reading . Pasting logs here :
observe_read_file: general error reading C:/data1.csv.gz - error: java.lang.IllegalArgumentException: Object: ?Gq?Y data1.csv ?ýI²u;®&ösÀy×XM?)²?2#ô?Y¦eGó?E. ðãvÏÛ8¿ûKD?ÿüßÿ·ÿöÿÿûÿó¿ÿßÿûÿö÷¿ü×ÿçÿþÿïÿëÿöü¿Âßÿòýçÿ1~ÿ|ÿïÿüýoãÿ?Âÿãuã¿Zc¾ÿÿþþ#ºÚß_ÿóý?ÿ^ê¯ Þþ|ÐòOþþæÊ?m´Yÿ-û_?ÿø¿Ùêeéè¤º?ßFïÿ?â=Dõ¡Îsø'ýýýÏÿñ?ÿý#¦5ÃøÍáþ¼þ?0 kv%xeßç9ø×ý¤ôrmÖ?GËosþá?)ÀæþB?³I-®ù??×øe¬{w~4?GÐÖ¾vÞÂã_p«?±Î±?ºey=Á?¾_æRøÚ¬¿?ãh?{íóèÚ£Ï2fö²íæ¶:50z? FË?mìûµ¬>àe-Ëß]wG??0¯û?ðÿt°¯=??htë?Îq?{?dÿ ®n¿Þÿë¿ýç?À<°©,p¸?±??Àã6-LaY¿¿kYóïÑ®ñ(uu7ÉyBéo?~?c\þV¸FÃ8?\ä],ãr?]N.ñíñ÷åK.?ÏÇn?<fr}^?ÏËýy¥»Á &×

shashwat · August 26, 2017, 6:13am

Following the different threads (https://www.elastic.co/guide/en/logstash/current/offline-plugins.html and Read a gzip file with gzip_lines codec), I built gzip_lines plugin on my Linux (offline) server. Also, I did not reference the .gz files directly from my logstash conf file but I kept them (their absolute paths) in a .txt file line by line. BUT, logstash still not working as expected, it reads that text file line by line and not parsing the .gz files mentioned in it. can any one can help on this?

shashwat · August 26, 2017, 3:10pm

Finally , got it working. But again one more question, gzip_lines is not performing the tail on the latest added .gz entry in the .txt file but it re-process all the files listed .txt files. Any specific reason for that. I want only latest .gz entry to be processed by gzip_lines logstash plugin. Any thoughts on this??

system · September 23, 2017, 3:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Gzip_lines codec not working Logstash	1	662	November 17, 2017
Unable to read .gz files in logstash Logstash	13	7126	August 13, 2018
Problems when indexing csv files Elasticsearch	3	1584	June 22, 2019
Logstash won't parse CSV to elasticsearch Logstash	3	638	May 14, 2019
Logstash for application.log + application.log.1.gz Logstash	6	154	March 1, 2024

Using Logstash 5.5 for reading .gz files

Related topics