I am importing into Amazon Elasticsearch with logstash 5.6.2 a 120MB CSV file that contains 27,858 US cities with their polygons.
I run logstash with stdin and not with a file so it stops at the end.
when importing, I only have around 26,400 cities. The rest is MIA and no error in the output or any logs.
Here are some facts:
- The file has one geo json field for the geo shapes and all are well formed
- All document ids are uniq (and I get no deleted docs in the index).
I tried to use a persistent queue and set drain=true in the setting file but it does not change anything.
The process ends without error but I am still missing some records.
The settings file:
queue.type: persisted
queue.drain: true
the configuration file:
input { stdin { } } filter { csv { separator => "|" quote_char => "@" columns => ["gid", "community_id", "hj_id", "name", "state", "lon", "lat", "geo"] add_field => { "location" => "%{lat},%{lon}" "suggest" => "%{name}, %{state}" } remove_field => ["lat", "lon", "host", "message", "@version"] } mutate { convert => { "gid" => "integer" } convert => { "community_id" => "integer" } } json { source => "geo" target => "geo" } } output { stdout { codec => rubydebug } amazon_es { hosts => ["XXXX.es.amazonaws.com"] region => "XXX" aws_access_key_id => 'XXXX' aws_secret_access_key => 'XXXX' index => "cities-%{+YYYYMMdd}" template => "./cities-template.json" template_name => "cities" document_type => "city" document_id=>"%{hj_id}" } }
Where shall I look at next?
How can I check what is in the persistent queue?
Do I need to restart logstash in a certain way to process records from the queue?
Thanks
Xavier