Logstash unable to load csv data into elasticsearch on windows

Hi,
I have been trying to load csv file into elasticsearch using logstash locally on windows 10. I had no success. The same configuration file seems to work on a mac sometimes. However, even on mac often fails without any useful errors whatsoever.

logstash_config.conf

input {
    file {
        path => ["C:\Users\smiccv0\Documents\software\ELK_Stack\data\KPI.csv"]
        start_position => "beginning"
    }
}

filter{
    csv {
        separator => ","
        columns => ["Client_No","Date","Month","Num_Components","Num_Packages",
        "Num_Fallbacks","FallBack_Ratio","MeanTime","Num_Users"]
    }

    date {
        match => ["Date", "M/dd/yyyy", "MM/dd/yyyy", "M/d/yyyy"]
        timezone => "UTC"
        target => "@timestamp"
        add_field => {"debug" => "timestampMatched"}
    }

    mutate {convert => ["Num_Components","integer"]}
    mutate {convert => ["Num_Packages", "float"]}
    mutate {convert => ["Num_Fallbacks","integer"]}
    mutate {convert => ["FallBack_Ratio", "float"]}
    mutate {convert => ['Num_Users',"integer"]}
    mutate {convert => ['MeanTime',"integer"]}
}

output {  
    elasticsearch {
        hosts => ["localhost:9200"]
        index => "test_test"
    }
    stdout {codec => rubydebug}
}

logstash output

C:\Users\smiccv0\Documents\software\ELK_Stack\logstash-6.4.2>bin\logstash -f logstash_config.conf
Sending Logstash logs to C:/Users/smiccv0/Documents/software/ELK_Stack/logstash-6.4.2/logs which is now configured via log4j2.properties
[2018-10-04T00:25:53,915][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-10-04T00:25:54,482][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.4.2"}
[2018-10-04T00:25:59,857][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-10-04T00:26:00,265][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[2018-10-04T00:26:00,269][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2018-10-04T00:26:00,426][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2018-10-04T00:26:00,497][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-10-04T00:26:00,500][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-10-04T00:26:00,525][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}
[2018-10-04T00:26:00,544][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-10-04T00:26:00,565][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2018-10-04T00:26:01,099][INFO ][logstash.inputs.file     ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"C:/Users/smiccv0/Documents/software/ELK_Stack/logstash-6.4.2/data/plugins/inputs/file/.sincedb_743dc853a82996b534570c7c38d7ddb2", :path=>["C:\\Users\\smiccv0\\Documents\\software\\ELK_Stack\\data\\KPI.csv"]}
[2018-10-04T00:26:01,148][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x21b65afa run>"}
[2018-10-04T00:26:01,202][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2018-10-04T00:26:01,211][INFO ][filewatch.observingtail  ] START, creating Discoverer, Watch with file and sincedb collections
[2018-10-04T00:26:01,466][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

Link to kibana and elasticsearch command prompt output:
https://drive.google.com/file/d/1Jn5dxNroovosTke8VwUCOQ0_piuuEpeo/view?usp=sharing

Please let me know how to fix it.
Thanks in advance!

Since there are no errors on the Logstash output, and you haven't explicitly defined a since_db file path in your file input, my guess is that this specific file is considered as read by Logstash and it won't reprocess it.

You can read more about since_db files here.

If you need to reprocess the whole file every time you start Logstash, you need to disable since_db, like so:

input {
    file {
        path => ["C:\Users\smiccv0\Documents\software\ELK_Stack\data\KPI.csv"]
        start_position => "beginning"
        sincedb_path => "nul"
    }
}

Hi,
Thanks for the prompt response. I tried with the change suggested but still, did not work.

logstash output

[2018-10-04T07:53:52,881][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-10-04T07:53:53,474][INFO ][logstash.runner          ] Starting Logstash {"logstash.version"=>"6.4.2"}
[2018-10-04T07:53:58,080][INFO ][logstash.pipeline        ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-10-04T07:53:58,442][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[2018-10-04T07:53:58,442][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}                                                                                                                                 [2018-10-04T07:53:58,631][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2018-10-04T07:53:58,660][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-10-04T07:53:58,676][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-10-04T07:53:58,692][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}
[2018-10-04T07:53:58,707][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}                                                      [2018-10-04T07:53:58,738][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}                                                                                                                                             [2018-10-04T07:53:59,254][INFO ][logstash.pipeline        ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x1208d62 sleep>"}         [2018-10-04T07:53:59,301][INFO ][filewatch.observingtail  ] START, creating Discoverer, Watch with file and sincedb collections
[2018-10-04T07:53:59,301][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}             [2018-10-04T07:53:59,646][INFO ][logstash.agent           ] Successfully started Logstash API endpoint {:port=>9600}

Thanks

I think therer's a spelling error in since_db path, it should be "null" instead of "nul". But if you want to reprocess the whole file every time, you can set the mode to "read". You will also need to set a complete action. Like this:

input {
    file {
        path => ["C:\Users\smiccv0\Documents\software\ELK_Stack\data\KPI.csv"]
        mode => "read"
        file_completed_action => "log"
        file_completed_log_path => "C:\Users\smiccv0\Documents\software\ELK_Stack\logs\logstash"
    }
}

This will make logstash read the whole file when it starts, and log a complete action to the C:\Users\smiccv0\Documents\software\ELK_Stack\logs\logstash folder. I also think you need to create this folder prior to starting logstash.

Hi,
Thanks for the response. I will try and update you.

However, having sincedb_path => 'nul' and deleting uuid and nul file created before running logstash, I was able to push data to elasticsearch everytime.
On windows, specifying sincedb_path => 'nul' and running logstash doesn't create any 'nul' file as in mac.
Not sure where the problem is.

Thanks

On Windows "nul" is equal to "/dev/null" on Unix, meaning that Logstash won't create any since_db and will always reprocess the file from the beginning. So it's expected that you don't see any file created.

OK. I still could not send my data to elasticsearch on windows. However, on my mac it seems to work without a glitch.

Let me know there is something i should try. I am using all the latest versions

Thanks

Ok, I didn't know the "nul" value was special on Windows. Good to know!

@cj_vegi did you try to set the mode to "read"?

Hi,
I tried mode, still unable to send into elasticsearch.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.