ES 2.3.2 Delete by Query increasing "Size" parameter is not helping

SKumarMN · December 12, 2016, 7:50am

Hi,

I am executing the the delete by query plugin as below. The index is having around 600000 documents and the type which i want to delete is having 105365 documents. Irrespective of the value of size i set, the total time to delete the document is same. Is there a way to increase the delete speed relatively?

curl -XDELETE "http://poc10uas.us.com:9200/defaultindex/portalregistry/_query?source={""query"":{""match_all"":{}}}&size=10000"

SKumarMN · December 12, 2016, 11:28am

@nik9000 @dadoonet plz advice

dadoonet · December 12, 2016, 12:25pm

Please read carefully the end of: About the Elasticsearch category

dadoonet · December 17, 2016, 11:47am

Changing size won't help here.

You can try another strategy which is to reindex in another index using reindex API and see if it is worth it.

Deleting a lot of docs is per nature slow.

Is it a recurring operation? Or one time?

May be you can look at daily indices depending on your case?

SKumarMN · December 19, 2016, 7:17am

Hi,

Its a recurring operation and indices are not time based to segregate them to indices based on time.
Could you please let me know what the functionality of size parameter? It is similar to scroll_size param in ES 5.0 delete by query api?

dadoonet · December 19, 2016, 7:42am

May be you are using a 2.x version, here? I'm surprised that size is accepted.

In 5.1, you can do Delete by query operations in parallel using slice:

https://www.elastic.co/guide/en/elasticsearch/reference/5.1/docs-delete-by-query.html#docs-delete-by-query-automatic-slice

So you'll be able to do that in less time I think than in 2.x series.
But with more load on the cluster.

Note that deleting a lot of data has an impact on IOs. Your cluster will probably do a lot of merge operations.

Wondering what will be actually your query? A match_all?

SKumarMN · December 19, 2016, 8:11am

Yes . Documentation which mentions about size (Using Delete-by-Query | Elasticsearch Plugins and Integrations [2.0] | Elastic)

Yes query is a match_all. curl -XDELETE "http://poc10uas.us.com:9200/defaultindex/portalregistry/_query?source={""query"":{""match_all"":{}}}&size=10000"

dadoonet · December 19, 2016, 8:37am

So it's definitely better to have one index per type. And simply drop the index.

So instead of sending your docs in defaultindex/portalregistry, send them in defaultindex-portalregistry/portalregistry.

When needed, just run DELETE defaultindex-portalregistry and it will be immediate!

SKumarMN · December 19, 2016, 8:54am

We have one index with multiple types under it. Each types has roughly the same number of documents baring a few. Typically we have 1 index with around 30 types.

Each type has roughly 40000 - 60000

Say if i want to move each type to a new index and even though i set the shards to 1or 2(primary), wont it affect the overall system performance as there are too many shards and each will have its own contention for resources. Would it be a good design considering we have a single 3 ES servers with 16 GB ES_HEAP_SIZE and multi core cpu's.

dadoonet · December 19, 2016, 9:45am

I'd create one index per type with one single shard. So you will end up to 60 shards (including 1 replica) on 3 nodes, which is around 20 shards per node. It looks reasonable to me.

Note that we have been discussing for a while the possibility of removing types.

So with the one-type-per-index strategy, you will be ready for that

SKumarMN · December 20, 2016, 10:50am

Nice. Thanks for the info.

system · January 17, 2017, 10:50am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Increasing scroll size in delete by query Elasticsearch	5	2374	July 5, 2017
When deleting by query api the number of deleted_docs and index size_in_bytes does not decrease Elasticsearch	3	1077	July 6, 2017
Slow deletes Elasticsearch	5	3298	July 5, 2017
Delete by query deletes only 1000 documents, then quits Elasticsearch	8	274	February 5, 2024
Delete By Query and index Size Elasticsearch	2	1761	June 15, 2018

ES 2.3.2 Delete by Query increasing "Size" parameter is not helping

Related topics