ES-to-ES reindexing: why documents are deleted in destination index?


(Eugene Glotov) #1

Hello.

I have the next configuration file for the AWS ES reindexing:

input {
  elasticsearch {
    hosts => "search-.....es.amazonaws.com:443"
    index => "logstash-source-index*"
    #query => "*"
    size => 500
    scroll => "5m"
    docinfo => true
    ssl => true
  }
}

filter {
}

output {
    if [type] == "tablet-heartbeat-prepared" {
        stdout {
            codec => rubydebug
        }
        elasticsearch {
           document_id => "%{id}"
           hosts => "search-......es.amazonaws.com:443"
           manage_template => false
           ssl => true
           flush_size => 250000
           index => "logstash-destination-index"
           document_type => "%{type}"
        }
    }
}

I am using next command to run this config:

sudo time /usr/share/logstash/bin/logstash --path.settings="/etc/logstash" --log.level=debug --config.debug --path.logs=/etc/logstash -f /etc/logstash/conf.d/es-to-es.conf

After execution I don't see any errors inside /etc/logstash/logstash-plain.log. And I see good documents in stdout. But in the destination index I have many deleted documents and just small pie of normal data.

https://search-......es.amazonaws.com/_cat/indices/?v

health status index                                   uuid                   pri rep docs.count docs.deleted store.size pri.store.size
yellow open   logstash-destination-index                    L3WX...PQ   5   1         64          388      1.2mb          1.2mb

Why documents are deleted?


(Magnus Bäck) #2

Perhaps because ES doesn't actually update documents but rather deletes old ones and creates new ones? Hence, updating a document will increase the "deleted" counter until the segment the now obsolete document lived in is merged and the document is actually deleted from the store.


(Eugene Glotov) #3

So, if I understand correctly, the problem with possible duplicates. But I see a normal "id" field in stdout for each document. And it's a new index – how these documents can be updated?

The typical document has unique ID in the logstash-source-index:

{
...
                   "@version" => "1",
                         "id" => "6d749309-75e2-4848-a197-a91c57ba2d43",
}

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.