Hi
I use ES in a cluster, and the system is very busy. if the ES cluster receives many delete messages in a short time, could these messages reduce the cluster performance obviously?
If I don't delete the index at once, but do it later. is it better? For example, if the system is busy, only update all key-values to empty value { } . When the cluster is idle, delete all related index in a bulk message.
Update operations are equivalent to deleting the old document and adding the updated document as a new document. Hence, update + delete is more expensive than just a delete.
Hi @Magnus,
Thank you for your reply. I still have a confusion. If I update only one key-value in the index, does the whole document be updated ?
If the answer is no, I have another thought. I can add a new key-value flag (default "1" ) in every index, if the index is deleted, I just update the key-value flag to "0" because the system is busy. When the cluster is idle, I could delete all unused index which the flag is "0" in batch. Is it a reasonable idea?
Thank you for your reply. I still have a confusion. If I update only one key-value in the index, does the whole document be updated ?
Yes.
If the answer is no, I have another thought. I can add a new key-value flag (default "1" ) in every index, if the index is deleted, I just update the key-value flag to "0" because the system is busy. When the cluster is idle, I could delete all unused index which the flag is "0" in batch. Is it a reasonable idea?
You mean you'd add a separate document that indicates whether the index is going to be deleted? How are you going to use that in your queries? Anyway, deleting the whole index is a cheap operation.
Thanks a lot. From reference, I see every update will cause a old document is deleted(not at once, but do it later) and a new document is created. And the index is pointed to the new document.
There is a mistake in my last answer. I means that there is an additional field in every document. for example;
PUT /website/blog/123
{
"title": "My first blog entry",
"text": "I am starting to get the hang of this...",
"date": "2014/01/02",
"deleted_flag" : true
}
After delete the document, the "deleted_flag" will be updated to false.
when query data every times, a extra condition should be added as below.
GET /website/blog/123
{
I want to use esRdd of spark to do some operation in ES, but it seems that esRdd doesn't support delete operation. I am trying to find other optional plan for the same result in ES.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.