How to parse a dynamic CSV file regularly


(Vikas Gopal) #1

Hi Experts,

I am parsing a csv file which keeps on updated .I mean it is not static new data is added to CSV after every 30 min or so . My question is how I can handle this situation. I cannot afford to create a new index . Is it possible to have a single index and logstash parse this file regularly plus I can see new records or events in Kibana ?

For now, even if I add a new line to an existing CSV I have to create a new index .

Thanks
VG


(Magnus Bäck) #2

This is what Logstash's file input plugin does out of the box. It reads new data from log files every second by default. Whether you have one or more Elasticsearch indexes has nothing to do with this.


(Vikas Gopal) #3

Thank you for your prompt response. Yes you are right about file input plugin and I am using the same .

input {
file {
path => "E:/VGCSV/test13.csv"
start_position => "beginning"
}
}
filter {
csv {
columns => ["Name","Roll No","Class","Marks"]
separator => ","

}
if [Name] == "Name" {
drop { }
}
}

output {
elasticsearch {
action => "index"
host => "localhost"
index => "csv13"
workers => 1
}
}

As you can see I am using a csv file with the name "test13". Now If I add a new record in this file it won't reflect that record in kibana . So , to show new record in Kibana I have to stop Logstash , ES and have to change the name of the CSV file along with the Index name . So for every new record I have to change name like "test14" and Index name to "CSV14" only then it will parse this new file then only I can see new records in Kibana.


(Magnus Bäck) #4

Never mind Kibana for now. Have you used ES's REST-based query interface that the message hasn't been inserted into the index? You get a good overview of your ES cluster with the kopf plugin. For example, it'll tell you if the number of documents in an index increases.

Have you manage to somehow disable automatic refreshes for indexes (the refresh_interval setting)? Check the index settings (also accessible via kopf).


(Vikas Gopal) #5

Cool I like the idea, will download KOPF plugin and see if the documents in an index increases.


(Vikas Gopal) #6

wow !! I have installed this plugin and I can see that after insertion document count increases and the same appears in Kibana. The interval for index refresh was set to 20s which I modified to 1s .

Now, I can see that I can delete and existing index from this plugin , but after deleting why I cannot create index with the same name by using same CSV file ? I tried and every time it adds a new node to the cluster which has 0 index . So, again I have to change the name of the CSV file along with Index name .


(Magnus Bäck) #7

What's the message you get when you try to recreate an index that you previously deleted?

Instead of deleting the index you could just delete all documents in it.


(Vikas Gopal) #8

This is what I can see in ES , I mean the last line of command line
[29] indices into cluster_state
[2015-06-11 14:35:50,788][INFO ][cluster.service ] [Hawkshaw] added {[l
ogstash-LP-54EE752450D4-5744-4082][l_GJk1CzQKuJGDGkY9qvIQ][LP-54EE752450D4][inet
[/192.168.1.2:9301]]{data=false, client=true},}, reason: zen-disco-receive(join
from node[[logstash-LP-54EE752450D4-5744-4082][l_GJk1CzQKuJGDGkY9qvIQ][LP-54EE75
2450D4][inet[/192.168.1.2:9301]]{data=false, client=true}])

And ES log last lines.

[2015-06-11 14:44:59,253][INFO ][cluster.service ] [Hawkshaw] added {[logstash-LP-54EE752450D4-324-4080][b5G4MTu1QzOGDxIcuUnozA][LP-54EE752450D4][inet[/192.168.1.2:9301]]{data=false, client=true},}, reason: zen-disco-receive(join from node[[logstash-LP-54EE752450D4-324-4080][b5G4MTu1QzOGDxIcuUnozA][LP-54EE752450D4][inet[/192.168.1.2:9301]]{data=false, client=true}])
[2015-06-11 14:45:00,476][INFO ][cluster.metadata ] [Hawkshaw] [map9] creating index, cause [auto(bulk api)], shards [5]/[1], mappings [default]
[2015-06-11 14:45:01,965][INFO ][cluster.metadata ] [Hawkshaw] [map9] update_mapping [logs] (dynamic)

I have 29 total Indices and "Hawkshaw" is my current ES node.


(system) #9