Elasticsearch input on logstash duplicating documents

Matthew_Field · September 22, 2020, 3:10pm

According to the docs, if i don't set a schedule, the logstash input plugin should run once and only once. However i find logstash keeps adding new duplicate/triplicate etc documents to my output for the longer i run.

My config is:

        elasticsearch{
                hosts=>"https://10.x.x.x:9200"
                ca_file=>"/etc/logstash/ca.crt"
                user=>"elastic"
                password=>"password"
                index=>"my-index*"
                query=> '{ "query": { "match_all": {} } }'
                size => 500
                scroll => "5m"
                docinfo => true
}

I know that i could add the ID to the output config so that the documents dont get duplicated, but i would then be putting lots of unnecessary load on the output.

Shouldnt logstash just stop the pipeline once the initial document set has gone through the pipeline? (the query should be run once and only once)
Is there something else wrong here?
Where should the scroll ID be getting stored?
Thanks in advance for any help you can give.

system · October 20, 2020, 3:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to stop duplicate entries using elasticsearch plugin Logstash	10	6265	June 29, 2017
Logstash elasticsearch input plugin logs duplication Logstash	4	792	September 13, 2018
Duplicate record exatly twice Logstash	8	860	April 20, 2018
Elasticsearch to Elasticsearch duplicating messages Logstash	1	427	April 11, 2017
How to avoid elasticsearch duplicate documents Logstash	6	1707	March 5, 2018

Elasticsearch input on logstash duplicating documents

Related topics