We are a heavy user of es for log searching, one log job is now indexing
more than 10,000 lines per minute (just name it as 'A')
We now have another log 'B' which we want to join it into A with the _id ,
and we decide to use update operation.
B is relatively small, 2 million a day and not real-time,( but will go up
in the future)
Anybody here know the performance of update? Or will the cluster's
performance go down facing the frequent update?
Both production environment experience and theoretical explanation are
appreciated.
Thank you.
an update operation is nothing else than a reindex operation - which in
turn marks the old document as deleted and creates a new document. Also an
update operations fetches the document from the index first, and then
applies the specified updates from the request. So basically you are just
doing two million more get and two million more index operations per day -
something that should work, depending of your current cluster is already at
its capacity or not. As usual the easiest way is to try it out on your
staging/testing systems...
We are a heavy user of es for log searching, one log job is now indexing
more than 10,000 lines per minute (just name it as 'A')
We now have another log 'B' which we want to join it into A with the _id
, and we decide to use update operation.
B is relatively small, 2 million a day and not real-time,( but will go up
in the future)
Anybody here know the performance of update? Or will the cluster's
performance go down facing the frequent update?
Both production environment experience and theoretical explanation are
appreciated.
Thank you.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.