Delete By Query and index Size


(Ramdev Wudali) #1

We are running a ES Cluster on 2.3.3, and have the Delete-By-Query plugin installed.
As I am updating an index using the delete by query API, I noticed the size of the index increasing.
That said, I was wondering, if its possible to run the optimize on an index that is being updated (using the delete-by-query)

The delete-by-Query request is a pretty large request (deletes a major portion of the documents in the index, in the order of 10's of millions of records)

thanks

Ramdev


(Christoph) #2

Since Elasticsearch uses Lucene as its underlying data storage layer, and Lucene uses append-only datastructures, deleting documents is performed in marking them as deleted until the underlying files get merged. So the space might be reclaimed later, while performing a large delete-by-query might need some auxiliary space while it is running. I wouldn't recomment running "optimize" while such a query is running, since the execution of the query already puts load on the server and the merge process that "optimize" triggers should run automatically later anyway.
If you are deleting a mayority of the documents, maybe its faster to copy the remaining documents over to a new index, then drop the complete old one and move an alias pointing to the new index instead? That depends on your use case and space available I think, but it might be worth a try to compare it.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.