Is this conf file working for csv import?


(Greg M) #1

I am trying to get log stash to monitor a folder to import my csv data load to elastic search and view in Kibana. Logstash/ES/Kibana instances are all up and running but LS does not create the index.

"default settings used: Filter workers: 2
Log stash startup completed

My logstash.conf file contains the input folder, filter tags incl my headers and output to my local ES instance (running).

input {
file {
path => "/Datasets/*.csv"
type => "core2"
start_position => "beginning"
}
}
filter {
csv {
columns => ["dollarsobligated", "signeddate", "ultimatecompletiondate", "contractactiontype", "typeofcontractpricing", "subcontractplan", "descriptionofcontractrequirement", "vendorname", "principalnaicscode", "modnumber", "fiscal_year", "idvpiid", "extentcompeted", "numberofoffersreceived", "typeofsetaside", "numberofemployees"]
separator => ","
}
}
output {
elasticsearch {
action => "index"
hosts => "localhost:9200"
index => "mydata-%{+YYYY.MM.dd}"
workers => 1
}
# stdout {
# codec => rubydebug
# }
}


(Magnus Bäck) #2

Logstash is probably stuck tailing the input file. Shut down Logstash, delete the sincedb file, and start it again. See the file input documentation for more information about sincedb.


(Greg M) #3

Magnus thank you. sincedb does not exist in any location.

best regards,

Greg Meadows (my LinkedIn profile) http://www.linkedin.com/in/gregmeadows
Red Cloud Services http://www.redcloudservices.com (click to see our
latest video)


(Magnus Bäck) #4

I'm pretty sure there's a sincedb file somewhere. If you increase the log level with --verbose it'll log the location of the file (or tell you if it can't create the file anywhere, which you'd need to address anyway).


(Greg M) #5

thanks Magnus verbose is set and we see...

Greg-Meadowss-MacBook-Pro:logstash-2.0.0 greg$ sudo bin/logstash agent -f
logstash.conf --verbose
Default settings used: Filter workers: 2
Registering file input {:path=>["/Datasets/.csv"], :level=>:info}
No sincedb_path set, generating one based on the file path
{:sincedb_path=>"/Users/greg/.sincedb_fb3a1da9efd4fe12fec475901ad215d1",
:path=>["/Datasets/
.csv"], :level=>:info}
Worker threads expected: 2, worker threads started: 2 {:level=>:info}
Automatic template management enabled {:manage_template=>"true",
:level=>:info}
Using mapping template {:template=>{"template"=>"logstash-",
"settings"=>{"index.refresh_interval"=>"5s"},
"mappings"=>{"default"=>{"_all"=>{"enabled"=>true, "omit_norms"=>true},
"dynamic_templates"=>[{"message_field"=>{"match"=>"message",
"match_mapping_type"=>"string", "mapping"=>{"type"=>"string",
"index"=>"analyzed", "omit_norms"=>true}}},
{"string_fields"=>{"match"=>"
", "match_mapping_type"=>"string",
"mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true,
"fields"=>{"raw"=>{"type"=>"string", "index"=>"not_analyzed",
"ignore_above"=>256}}}}}], "properties"=>{"@version"=>{"type"=>"string",
"index"=>"not_analyzed"}, "geoip"=>{"type"=>"object", "dynamic"=>true,
"properties"=>{"location"=>{"type"=>"geo_point"}}}}}}}, :level=>:info}
New Elasticsearch output {:hosts=>["localhost:9200"], :level=>:info}
Pipeline started {:level=>:info}

....does LS look good? If yes then ES should be creating an index for this
(and any new csv) files and I should them show up in Kibana.

note: I use sudo but is that necessary?

best regards,

Greg Meadows (my LinkedIn profile) http://www.linkedin.com/in/gregmeadows
Red Cloud Services http://www.redcloudservices.com (click to see our
latest video)


(Magnus Bäck) #6

....does LS look good? If yes then ES should be creating an index for this (and any new csv) files and I should them show up in Kibana.

Logstash will only read the file from the beginning if it's a previously unknown file. If the sincedb file you now know the path to existed when you started Logstash it'll use the file offset in that file (and, most likely, tail the file). To force Logstash to start from the beginning, delete the file (while Logstash isn't running).

The part of the log that you didn't quote here will include information about the current file offset.

note: I use sudo but is that necessary?

No, not if the greg user can read the files in /Datasets. Don't use sudo unless you have to.


(Greg M) #7

thank you..I shutdown log stash, unhid the sincedb file and deleted it. ELK stack has the following issues:

  1. My new Kibana has "No results found" - my csv fields are all type string and hidden by Kibana - see screenshot

How do I get my imported fields to display in Kibana by default (and not hidden)?

  1. localhost:9200/_cat/indeces?v shows my new index "logstash-2015.10.29". However, the index name specified in my .conf file is being ignored (see org post). I need to create multiple dashboards so not sure if I simply bulk import all csv files into default index (logstash-*) or force logstash to create each index for me. How do I get logstash to use my index names upon import?

  2. How do I set the type for my date fields? Is the PUT Mapping API via command line the only option? I have a few hundred fields prepared.

logstash log upon startup...

greg$ bin/logstash agent -f logstash.conf --verbose
Default settings used: Filter workers: 2
Registering file input {:path=>["/Datasets/.csv"], :level=>:info}
No sincedb_path set, generating one based on the file path {:sincedb_path=>"/Users/greg/.sincedb_fb3a1da9efd4fe12fec475901ad215d1", :path=>["/Datasets/
.csv"], :level=>:info}
Worker threads expected: 2, worker threads started: 2 {:level=>:info}
Automatic template management enabled {:manage_template=>"true", :level=>:info}
Using mapping template {:template=>{"template"=>"logstash-", "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"default"=>{"_all"=>{"enabled"=>true, "omit_norms"=>true}, "dynamic_templates"=>[{"message_field"=>{"match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true}}}, {"string_fields"=>{"match"=>"", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fields"=>{"raw"=>{"type"=>"string", "index"=>"not_analyzed", "ignore_above"=>256}}}}}], "properties"=>{"@version"=>{"type"=>"string", "index"=>"not_analyzed"}, "geoip"=>{"type"=>"object", "dynamic"=>true, "properties"=>{"location"=>{"type"=>"geo_point"}}}}}}}, :level=>:info}
New Elasticsearch output {:hosts=>["localhost:9200"], :level=>:info}
Pipeline started {:level=>:info}
Logstash startup completed


(Magnus Bäck) #8

localhost:9200/_cat/indeces?v shows my new index "logstash-2015.10.29". However, the index name specified in my .conf file is being ignored (see org post). I need to create multiple dashboards so not sure if I simply bulk import all csv files into default index (logstash-*) or force logstash to create each index for me. How do I get logstash to use my index names upon import?

Is there any evidence that Logstash is importing any of the data? Does logstash-2015.10.29 contain the data you expect?

How do I set the type for my date fields? Is the PUT Mapping API via command line the only option? I have a few hundred fields prepared.

You should prepare an index template and make Logstash use it.


(Greg M) #9

yes it has my 60k docs - see first index. and these fields are listed in Kibana they just cannot be used to build a dashboard and "no results" is from the discover tab. I have many more csv files to import but need to understand how to get ELK to ingest them and display in Kibana.

url 'localhost:9200/_cat/indices?v'
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open logstash-2015.10.29 5 1 60272 0 57.4mb 57.4mb
yellow open .kibana 1 1 2 0 14.1kb 14.1kb


(Greg M) #10

SOLVED. Make sure to select "Show Last XX Days" (the default is the last 15 minutes).


(system) #11