Updating documents with Elasticsearch output

I'm trying to figure out how to update documents properly using the Elasticsearch output. Documentation on this issue is very confusing.

I have a document that might already be in Elasticsearch, and if so I would like to update it. Going over the documentation I see that there are multiple options for updating documents.

So for example this is my current configuration:

if [doc_id] {
  elasticsearch {
    hosts => ["http://elastic:9200"]
    document_type => "log"
    document_id => "%{doc_id}"
    doc_as_upsert => true
    action => "update"
    index => "myindex"
  }
}

This confiugration sometimes throws an error saying:

WARN logstash.outputs.elasticsearch - Failed action. {:status=>409, :action=>["update", {:_id=>"6b5b8db751dcd3b8586badfd70dca5", :_index=>"main_solan", :_type=>"log", :_routing=>nil, :_retry_on_conflict=>1}, 2017-07-05T10:27:46.555Z %{host} %{message}], :response=>{"update"=>{"_index"=>"myindex", "_type"=>"log", "_id"=>"6b5b8db751dcd3b8586badfd70dca5", "status"=>409, "error"=>{"type"=>"version_conflict_engine_exception", "reason"=>"[log][6b5b8db751dcd3b8586badfd70dca5]: version conflict, current version [406] is different than the one provided [405]", "index_uuid"=>"fN3yXZ6xSdSb-hdbP3SnOw", "shard"=>"0", "index"=>"myindex"}}}}

I'm not providing the version myself, so I'm not sure how Logstash knows what version I'm currently providing. I'm also not sure if I should be using action=update, or maybe just define upsert=true?

Any ideas?

If I remember well, It happend to me too.

See: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning

And if I'm correct, it means you try to update a document while its version is newer than you expect it to be. So probably due to multithreading or delay in I/O.

But logstash does not determine any versions. I mean, I insert documents and if I try and insert an older version due to latency or something else, it just means that the latest version will be incorrect, but from what I understand Logstash does not do its own versioning so why am I getting an error message?

You might maybe want to wait someone who know how it works behind scene (or google it maybe), but I imagine the following:
Suppose 2 document that will be in output (if your pipeline is not set to 1, it will happen).

  • Doc A arrives in output
  • For the update, I imagine (maybe) a read operation is perform. It reads data and its version: 100
  • The same for doc B. Version: 100
  • Doc A updates. Now the version of the doc in ES is 101 (automatic)
  • Doc B updates, but now the document is in version 101 but B has made the update on version 100. It isn't compatible anymore. Hence, fail.

Logstash will also contain version info.

https://www.elastic.co/guide/en/elasticsearch/guide/current/version-control.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/optimistic-concurrency-control.html
So I don't know, can you try to set pipeline workers to 1? (All of these are speculations based on update=get+put which can be reasonable for integrity purposes)

You are correct, Logstash does not know the document's version.
What you see there in the log is the error ElasticSearch reports back to Logstash.

As @Nico-DF said, it's most likely a race condition where you try to update a document's version before the previous change has finished indexing (a version number of 406 kinda points to many back-to-back updates).

You can try the retry_on_conflict setting and see if it solves the issue somewhat.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.