According to the docs, if i don't set a schedule, the logstash input plugin should run once and only once. However i find logstash keeps adding new duplicate/triplicate etc documents to my output for the longer i run.
My config is:
elasticsearch{
hosts=>"https://10.x.x.x:9200"
ca_file=>"/etc/logstash/ca.crt"
user=>"elastic"
password=>"password"
index=>"my-index*"
query=> '{ "query": { "match_all": {} } }'
size => 500
scroll => "5m"
docinfo => true
}
I know that i could add the ID to the output config so that the documents dont get duplicated, but i would then be putting lots of unnecessary load on the output.
Shouldnt logstash just stop the pipeline once the initial document set has gone through the pipeline? (the query should be run once and only once)
Is there something else wrong here?
Where should the scroll ID be getting stored?
Thanks in advance for any help you can give.