Reading CSV and applying indexing from logstash taking too much time

I am using below configuration for indexing using logstash

File -> Logstash -> Elasticsearch pipeline.

input {
file {
path => "C:/Users/temp/100_CC_Records.csv"
start_position=>"beginning"
}
}

filter {
csv{
separator=>","
columns=>["card_holder_name"]
}
}

output {
elasticsearch{
hosts=>"localhost"
index=>"list"
document_type=>"check"
}
stdout{}
}

Indexing is taking too much time.
Please help to resolve this issue.

I think you will need to provide some additional information. What does your data look like? What is the hardware this is running on? How are the components configured? What indexing throughput are you actually seeing?

card_type_code,card_type_name,issue_bank,card_number,card_holder_name,cvv_cvv2,issue_date,expiry_date,bill_date,card_pin,credit_limit
DS,Discover,Discover,6.4802E+15,Brenda D Peterson,689,Jan-17,Jan-22,4,1998,22700
DC,Diners Club International,Diners Club,3.02952E+13,Dawn U Reese,70,Dec-15,Dec-16,11,3915,12700

Like above there are hundred records.

System Configuration :
Host : localhost
RAM : 8 GB
OS : Windows 7 Enterprise 64-bit
Java Version : 1.8.0_201

[2019-03-11T16:07:40,231][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2019-03-11T16:07:41,030][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.6.0"}
[2019-03-11T16:11:21,193][WARN ][logstash.outputs.elasticsearch] You are using a deprecated config setting "document_type" set in elasticsearch. Deprecated settings will continue to work, but are scheduled for removal from logstash in the future. Document types are being deprecated in Elasticsearch 6.0, and removed entirely in 7.0. You should avoid this feature If you have any questions about this, please visit the #logstash channel on freenode irc. {:name=>"document_type", :plugin=><LogStash::Outputs::ElasticSearch index=>"wclist", id=>"410d4d4cfdcb001aef1266eb868fe1f1b1bc239a051b0bcca06ecaaab2d5c4e0", hosts=>[//localhost], document_type=>"wccheck", enable_metric=>true, codec=><LogStash::Codecs::Plain id=>"plain_ac9e8779-6478-4a26-aa68-c2bae98e5faa", enable_metric=>true, charset=>"UTF-8">, workers=>1, manage_template=>true, template_name=>"logstash", template_overwrite=>false, doc_as_upsert=>false, script_type=>"inline", script_lang=>"painless", script_var_name=>"event", scripted_upsert=>false, retry_initial_interval=>2, retry_max_interval=>64, retry_on_conflict=>1, ilm_enabled=>false, ilm_rollover_alias=>"logstash", ilm_pattern=>"{now/d}-000001", ilm_policy=>"logstash-policy", action=>"index", ssl_certificate_verification=>true, sniffing=>false, sniffing_delay=>5, timeout=>60, pool_max=>1000, pool_max_per_route=>100, resurrect_delay=>5, validate_after_inactivity=>10000, http_compression=>false>}
[2019-03-11T16:11:46,503][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2019-03-11T16:11:47,537][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>, :added=>[http://localhost:9200/]}}
[2019-03-11T16:11:48,134][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2019-03-11T16:11:48,284][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2019-03-11T16:11:48,320][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the type event field won't be used to determine the document _type {:es_version=>6}
[2019-03-11T16:11:48,406][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost"]}
[2019-03-11T16:11:48,431][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2019-03-11T16:11:48,702][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"default"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2019-03-11T16:13:57,238][INFO ][logstash.inputs.file ] No sincedb_path set, generating one based on the "path" setting {:sincedb_path=>"I:/SBS/ElasticSearch/logstash-6.6.0/data/plugins/inputs/file/.sincedb_bf6a5f7d7ebd9e36584813bfcd4a1221", :path=>["C:/Users/pm85549/Desktop/100_CC_Records.csv"]}
[2019-03-11T16:13:57,411][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x701fa52b run>"}
[2019-03-11T16:13:57,560][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>}
[2019-03-11T16:13:57,616][INFO ][filewatch.observingtail ] START, creating Discoverer, Watch with file and sincedb collections
[2019-03-11T16:14:40,495][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

Nothing happens after this log.

I suspect this may be due to the sincedb file preventing file to be reread. Have a look at the sincedb_path configuration parameter in the documentation and then set it to "NUL" to disable it in the file input specification.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.