Understanding Why Bulk Update is so Fast


(Ishan Durugkar) #1

Hi,
I need help understanding the ES\Lucene update operation.

I have a document structure where I have lots of documents with the same
content that I have to index, with different names.

Structure:
"file":{

"content": {

"type": "string"

},

"name": {

"type": "string"

}

}

Now if I index all the documents separately, it takes close to 10 hours,
but if I index just the unique content. and then update and add to the
'name' field for every repeated content, then the indexing takes just 45
minutes or so. The update operations are being sent as Bulk.

I have checked and all the names are added to the 'name' field.
How is the update operation happening so fast? I thought update internally
deletes the old document and creates a new one?

ES settings: 1 node 3 shard
Heap size: 4GB

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3f1494fd-8476-413c-a37c-21fb789d8074%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #2