Delete or update empty value, which is more efficient in ES?


(Gary Wu) #1

Hi
I use ES in a cluster, and the system is very busy. if the ES cluster receives many delete messages in a short time, could these messages reduce the cluster performance obviously?

If I don't delete the index at once, but do it later. is it better? For example, if the system is busy, only update all key-values to empty value { } . When the cluster is idle, delete all related index in a bulk message.

Thanks


(Magnus Bäck) #2

Update operations are equivalent to deleting the old document and adding the updated document as a new document. Hence, update + delete is more expensive than just a delete.


(Gary Wu) #3

Hi @Magnus,
Thank you for your reply. I still have a confusion. If I update only one key-value in the index, does the whole document be updated ?
If the answer is no, I have another thought. I can add a new key-value flag (default "1" ) in every index, if the index is deleted, I just update the key-value flag to "0" because the system is busy. When the cluster is idle, I could delete all unused index which the flag is "0" in batch. Is it a reasonable idea?

Thanks


(Magnus Bäck) #4

Thank you for your reply. I still have a confusion. If I update only one key-value in the index, does the whole document be updated ?

Yes.

https://www.elastic.co/guide/en/elasticsearch/reference/2.1/_updating_documents.html

If the answer is no, I have another thought. I can add a new key-value flag (default "1" ) in every index, if the index is deleted, I just update the key-value flag to "0" because the system is busy. When the cluster is idle, I could delete all unused index which the flag is "0" in batch. Is it a reasonable idea?

You mean you'd add a separate document that indicates whether the index is going to be deleted? How are you going to use that in your queries? Anyway, deleting the whole index is a cheap operation.


(Gary Wu) #5

Thanks a lot. From reference, I see every update will cause a old document is deleted(not at once, but do it later) and a new document is created. And the index is pointed to the new document.

There is a mistake in my last answer. I means that there is an additional field in every document. for example;

PUT /website/blog/123
{

   "title": "My first blog entry",
   "text": "I am starting to get the hang of this...",
    "date": "2014/01/02",
    "deleted_flag" : true

}
After delete the document, the "deleted_flag" will be updated to false.
when query data every times, a extra condition should be added as below.
GET /website/blog/123
{


"query": {
       "filter":{
              "query" : {"match": xxxx }  //some queries conditions
              "filter": { "term": { "deleted_flag": false}},  //extra judgement condition
       }
  }

}

I want to use esRdd of spark to do some operation in ES, but it seems that esRdd doesn't support delete operation. I am trying to find other optional plan for the same result in ES.


(system) #6