Loading a CSV file, basic question


(Christopher Curzon) #1

This is my first effort getting logstash to put data into Elasticsearch. So I think I must be missing something fairly obvious.

I have a addr.csv file with records in this format:

369|90045|123 ABC ST|LOS ANGELES|CA
368|90045|PVKA0010|LA|CA
etc...

and I have addr02.conf file like this:

input {
file {
path => "/home/zed/logstash-2.1.0/load_ex/addr.csv"
type => "addrtype"
start_position => "beginning"
}
}
filter {
csv {
columns => ["REC_ID", "ZIP_CODE", "STR_ADDR", "CITY", "ST"]
separator => "|"
}
}
output {
elasticsearch {
action => "index"
hosts => ["10.30.90.5"]
index => "logstash"
workers => 1
}

but when I start logstash like this

bin/logstash -f load_ex/addr02.conf -l load_ex/addrX.log --verbose

nothing gets into ES. the first few lines of the log file are this (blank lines added for clarity):

{:timestamp=>"2016-01-05T17:34:01.559000-0800", :message=>"Registering file input", :path=>["/home/zed/logstash-2.1.0/load_ex/addr.csv"], :level=>:info}

{:timestamp=>"2016-01-05T17:34:01.563000-0800", :message=>"No sincedb_path set, generating one based on the file path", :sincedb_path=>"/home/zed/.sincedb_339edbe413a1e111968d1d43df97837f", :path=>["/home/zed/logstash-2.1.0/load_ex/addr.csv"], :level=>:info}

{:timestamp=>"2016-01-05T17:34:01.572000-0800", :message=>"Worker threads expected: 4, worker threads started: 4", :level=>:info}

{:timestamp=>"2016-01-05T17:34:01.581000-0800", :message=>"Using mapping template from", :path=>nil, :level=>:info}

{:timestamp=>"2016-01-05T17:34:01.841000-0800", :message=>"Attempting to install template", :manage_template=>{"template"=>"logstash-", "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"default"=>{"all"=>{"enabled"=>true, "omitnorms"=>true}, "dynamic_templates"=>[{"message_field"=>{"match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}}}}, {"string_fields"=>{"match"=>"", "match_mapping_type"=>"string", "mapping"=>{"type"=>"string", "index"=>"analyzed", "omit_norms"=>true, "fielddata"=>{"format"=>"disabled"}, "fields"=>{"raw"=>{"type"=>"string", "index"=>"not_analyzed", "doc_values"=>true, "ignore_above"=>256}}}}}, {"float_fields"=>{"match"=>"", "match_mapping_type"=>"float", "mapping"=>{"type"=>"float", "doc_values"=>true}}}, {"double_fields"=>{"match"=>"", "match_mapping_type"=>"double", "mapping"=>{"type"=>"double", "doc_values"=>true}}}, {"byte_fields"=>{"match"=>"", "match_mapping_type"=>"byte", "mapping"=>{"type"=>"byte", "doc_values"=>true}}}, {"short_fields"=>{"match"=>"", "match_mapping_type"=>"short", "mapping"=>{"type"=>"short", "doc_values"=>true}}}, {"integer_fields"=>{"match"=>"", "match_mapping_type"=>"integer", "mapping"=>{"type"=>"integer", "doc_values"=>true}}}, {"long_fields"=>{"match"=>"", "match_mapping_type"=>"long", "mapping"=>{"type"=>"long", "doc_values"=>true}}}, {"date_fields"=>{"match"=>"", "match_mapping_type"=>"date", "mapping"=>{"type"=>"date", "doc_values"=>true}}}, {"geo_point_fields"=>{"match"=>"", "match_mapping_type"=>"geo_point", "mapping"=>{"type"=>"geo_point", "doc_values"=>true}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "doc_values"=>true}, "@version"=>{"type"=>"string", "index"=>"not_analyzed", "doc_values"=>true}, "geoip"=>{"type"=>"object", "dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip", "doc_values"=>true}, "location"=>{"type"=>"geo_point", "doc_values"=>true}, "latitude"=>{"type"=>"float", "doc_values"=>true}, "longitude"=>{"type"=>"float", "doc_values"=>true}}}}}}}, :level=>:info}

since it says

{:timestamp=>"2016-01-05T17:34:01.581000-0800", :message=>"Using mapping template from", :path=>nil, :level=>:info}

I'm guessing that I'm missing the correct template, or that I haven't correctly specified the template. This is a right-out-of-the-box kind of attempt at using LogStash, but I don't remember seeing anything in the docs about specifying templates.

Thank you for any help you can give.

-- Chris Curzon


(Mark Walkom) #2

Is the problem that there is no data in ES?

If so then delete this file and then restart LS;


(Christopher Curzon) #3

Thanks, I'm looking at that now.

How did you know what was the exact name of the file to delete?

Is there a good doc that describes Logstash from the operating system point of view? I'd like to know what files get created/deleted/etc in response to various operations.

Thanks.

-- Christopher Curzon


(Christopher Curzon) #4

No need to answer about the file path ... I see it in the log. :slight_smile:

Is this the mechanism which prevents logstash from loading something multiple times? Maybe it checks for the existence of the file, and if found, then it doesn't reload the data(?).


(Mark Walkom) #5

Yep - https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html#plugins-inputs-file-sincedb_path


(Christopher Curzon) #6

Thank you very much.

It feels like the pieces are starting to come together.


(system) #7