I'm using elasticsearch to store and analyse logs. Since I wanted old logs to be deleted automatically I added a ttl to my mapping. Now I sometimes get version conflict exceptions when my (PHP) application tries to update a timestamp in one of the fields. I'm trying to update the field using cURL and because of session locking it is impossible for a single user to generate more than one curl_exec()-call at a time. Furthermore I do not provide version information explicidly in the update query. There are two elasticsearch servers handling the log index however all queries are handled by only one of them. The other one is just a standby. The exceptions were thrown only after setting a ttl so I was wondering if there might be any correlation? Can any of you shed some light on that matter?
The update API gets the current version of the document (V1)
Elasticsearch deletes the document because of its TTL, which increments
the version of the document to V2 > V1
Elasticsearch builds the updated document and adds it to the index: it
expects the current version in the index to be V1 to succeed, yet it is now
V2 because of the TTL and thus the update fails.
For your information, it is generally a better practice to evict old logs
to have time-based indices (eg. one index per week) and to remove whole
indices when they become too old. This would be lighter for elasticsearch
as removing an old index is just about removing files on the filesystem
while deleteing lots of documents can trigger heavy merges in order to
reclaim the disk space.
I'm using elasticsearch to store and analyse logs. Since I wanted old logs
to be deleted automatically I added a ttl to my mapping. Now I sometimes
get
version conflict exceptions when my (PHP) application tries to update a
timestamp in one of the fields. I'm trying to update the field using cURL
and because of session locking it is impossible for a single user to
generate more than one curl_exec()-call at a time. Furthermore I do not
provide version information explicidly in the update query. There are two
elasticsearch servers handling the log index however all queries are
handled
by only one of them. The other one is just a standby. The exceptions were
thrown only after setting a ttl so I was wondering if there might be any
correlation? Can any of you shed some light on that matter?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.