How to parse a dynamic CSV file regularly

vikas_gopal · June 11, 2015, 6:09am

Hi Experts,

I am parsing a csv file which keeps on updated .I mean it is not static new data is added to CSV after every 30 min or so . My question is how I can handle this situation. I cannot afford to create a new index . Is it possible to have a single index and logstash parse this file regularly plus I can see new records or events in Kibana ?

For now, even if I add a new line to an existing CSV I have to create a new index .

Thanks
VG

magnusbaeck · June 11, 2015, 6:25am

This is what Logstash's file input plugin does out of the box. It reads new data from log files every second by default. Whether you have one or more Elasticsearch indexes has nothing to do with this.

vikas_gopal · June 11, 2015, 6:33am

Thank you for your prompt response. Yes you are right about file input plugin and I am using the same .

input {
file {
path => "E:/VGCSV/test13.csv"
start_position => "beginning"
}
}
filter {
csv {
columns => ["Name","Roll No","Class","Marks"]
separator => ","

}
if [Name] == "Name" {
drop { }
}
}

output {
elasticsearch {
action => "index"
host => "localhost"
index => "csv13"
workers => 1
}
}

As you can see I am using a csv file with the name "test13". Now If I add a new record in this file it won't reflect that record in kibana . So , to show new record in Kibana I have to stop Logstash , ES and have to change the name of the CSV file along with the Index name . So for every new record I have to change name like "test14" and Index name to "CSV14" only then it will parse this new file then only I can see new records in Kibana.

magnusbaeck · June 11, 2015, 6:41am

Never mind Kibana for now. Have you used ES's REST-based query interface that the message hasn't been inserted into the index? You get a good overview of your ES cluster with the kopf plugin. For example, it'll tell you if the number of documents in an index increases.

Have you manage to somehow disable automatic refreshes for indexes (the refresh_interval setting)? Check the index settings (also accessible via kopf).

vikas_gopal · June 11, 2015, 7:15am

Cool I like the idea, will download KOPF plugin and see if the documents in an index increases.

vikas_gopal · June 11, 2015, 8:05am

wow !! I have installed this plugin and I can see that after insertion document count increases and the same appears in Kibana. The interval for index refresh was set to 20s which I modified to 1s .

Now, I can see that I can delete and existing index from this plugin , but after deleting why I cannot create index with the same name by using same CSV file ? I tried and every time it adds a new node to the cluster which has 0 index . So, again I have to change the name of the CSV file along with Index name .

magnusbaeck · June 11, 2015, 8:21am

What's the message you get when you try to recreate an index that you previously deleted?

Instead of deleting the index you could just delete all documents in it.

vikas_gopal · June 11, 2015, 9:11am

This is what I can see in ES , I mean the last line of command line
[29] indices into cluster_state
[2015-06-11 14:35:50,788][INFO ][cluster.service ] [Hawkshaw] added {[l
ogstash-LP-54EE752450D4-5744-4082][l_GJk1CzQKuJGDGkY9qvIQ][LP-54EE752450D4][inet
[/192.168.1.2:9301]]{data=false, client=true},}, reason: zen-disco-receive(join
from node[[logstash-LP-54EE752450D4-5744-4082][l_GJk1CzQKuJGDGkY9qvIQ][LP-54EE75
2450D4][inet[/192.168.1.2:9301]]{data=false, client=true}])

And ES log last lines.

[2015-06-11 14:44:59,253][INFO ][cluster.service ] [Hawkshaw] added {[logstash-LP-54EE752450D4-324-4080][b5G4MTu1QzOGDxIcuUnozA][LP-54EE752450D4][inet[/192.168.1.2:9301]]{data=false, client=true},}, reason: zen-disco-receive(join from node[[logstash-LP-54EE752450D4-324-4080][b5G4MTu1QzOGDxIcuUnozA][LP-54EE752450D4][inet[/192.168.1.2:9301]]{data=false, client=true}])
[2015-06-11 14:45:00,476][INFO ][cluster.metadata ] [Hawkshaw] [map9] creating index, cause [auto(bulk api)], shards [5]/[1], mappings [default]
[2015-06-11 14:45:01,965][INFO ][cluster.metadata ] [Hawkshaw] [map9] update_mapping [logs] (dynamic)

I have 29 total Indices and "Hawkshaw" is my current ES node.

Topic		Replies	Views
Not able to create index with the same name Logstash	11	3069	July 6, 2017
File input not updating Logstash	6	1568	May 7, 2018
Logstash neither detect the newly added data in CSV file nor push it into elasticsearch Logstash	3	723	April 30, 2019
Logstash index without timestemp Logstash	8	665	July 6, 2017
Import CSV into Elasticsearch with Logstash issue Logstash	6	1643	May 2, 2018

How to parse a dynamic CSV file regularly

Related topics