I would like to know under which circumstances upserts can fail. We had been doings inserts and upserts through logstash. Many a times we have noticed that some upsert doesn't happen at the same time inserts would have gone through. We didn't notice anything in the logs related to upsert failures.
In between the get and indexing phases of the update, it is possible that another process might have already updated the same document. By default, the update will fail with a version conflict exception. Is it your case? If not, can you provide the exception bulk request is returning?
You can use the retry_on_conflict parameter to control how many times to retry the update before finally throwing an exception.
@luiz.santos we have seen logs at logstash, saying, bulk request failed and its retrying. But, from those logs the actual reason wasn't clear. I have read about version conflict exception, we never saw any version conflict exception in the logs.
Other obervations:
We observed that memory utilization was 100% in both the data nodes during the period when we lost data.
We had set retry_on_conflict to 5 through logstash es output plugin.
Bulk request queue was increased to 200 from 50, the queue used to hit the 200 limit sometimes.
We upgraded es version from 5.2.0 to 5.5.x due issue with circuit breakage.
Last time this data loss occurred, we upgraded the number of cores and memory (we were having low configuration, h/w upgrade was due, anyway). After upgrading the hardware (its been a week) there had been no data loss.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.