I want to understand the differences in performance between update and create.
I have a document that could be updated up to 4 times in my system. Is it making more sense
to only create new documents with a property that is responsible for letting me know which document is the most up to date?
Or should I just update 4 times? Asking from a performance perspective.
I only update 1 field in these 4 updates so this will be a partial update and not a whole document update.
In addition, Elasticsearch is NRT and my refresh interval is less 1 second. Can I update a document less than
1 second after I inserted it? Will it change the document in memory and store only the updated in the disk?
This doesn't really matter. An update means marking the old document as deleted and indexing the document fresh.
Updates are consistent though updating a document before it has been refreshed is more costly because it has to force a refresh so it can fetch the document and update it. From that perspective I think it is better to just index the document "on top" of the old document. No refresh ought to be required though that is worth testing just to make sure.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.