Elasticsearch input plugin creates more documents than what is in the originating index

The elasticsearch Logstash input filter seems to output more documents than the index it is set to input from. Below is the filter I'm using. What do I need to do to ensure it only inputs each record in the originating index once? Why does the new index show an increasing number of documents while logstash keeps running? The originating index has about 2 million records.

Note: my goal is to have two separate indexes, one with additional derived data, so the reindexer api is likely not what I want.

input {
  elasticsearch {
    hosts => "localhost:9200"
    index => "test"
    query => '{ "query": {"match_all": {}} }'
    size => 10000
    scroll => "5m"
  }
}
filter {
  mutate {
    ...
  }
}
output {
  elasticsearch {
    hosts => "localhost:9200"
    index => "test-new"
  }
}

Thanks!

It may be that the data is replicating. In that case, you can prevent replication by assigning the orgin index document_idfield to the destination document. So in the input, you need to set doc_values to true.

And then the output to elastic needs to have document_id => "%{[@metadata][_id]}"

Then you can guarantee a 1:1 index

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.