ES 2.3.2 Delete by Query increasing "Size" parameter is not helping


(SK) #1

Hi,

I am executing the the delete by query plugin as below. The index is having around 600000 documents and the type which i want to delete is having 105365 documents. Irrespective of the value of size i set, the total time to delete the document is same. Is there a way to increase the delete speed relatively?

curl -XDELETE "http://poc10uas.us.com:9200/defaultindex/portalregistry/_query?source={""query"":{""match_all"":{}}}&size=10000"


(SK) #2

@nik9000 @dadoonet plz advice


(David Pilato) #3

Please read carefully the end of: About the Elasticsearch category


(David Pilato) #4

Changing size won't help here.

You can try another strategy which is to reindex in another index using reindex API and see if it is worth it.

Deleting a lot of docs is per nature slow.

Is it a recurring operation? Or one time?

May be you can look at daily indices depending on your case?


(SK) #5

Hi,

Its a recurring operation and indices are not time based to segregate them to indices based on time.
Could you please let me know what the functionality of size parameter? It is similar to scroll_size param in ES 5.0 delete by query api?


(David Pilato) #6

May be you are using a 2.x version, here? I'm surprised that size is accepted.

In 5.1, you can do Delete by query operations in parallel using slice:

https://www.elastic.co/guide/en/elasticsearch/reference/5.1/docs-delete-by-query.html#docs-delete-by-query-automatic-slice

So you'll be able to do that in less time I think than in 2.x series.
But with more load on the cluster.

Note that deleting a lot of data has an impact on IOs. Your cluster will probably do a lot of merge operations.

Wondering what will be actually your query? A match_all?


(SK) #7

Yes . Documentation which mentions about size (https://www.elastic.co/guide/en/elasticsearch/plugins/2.0/delete-by-query-usage.html)

Yes query is a match_all. curl -XDELETE "http://poc10uas.us.com:9200/defaultindex/portalregistry/_query?source={""query"":{""match_all"":{}}}&size=10000"


(David Pilato) #8

So it's definitely better to have one index per type. And simply drop the index.

So instead of sending your docs in defaultindex/portalregistry, send them in defaultindex-portalregistry/portalregistry.

When needed, just run DELETE defaultindex-portalregistry and it will be immediate!


(SK) #9

We have one index with multiple types under it. Each types has roughly the same number of documents baring a few. Typically we have 1 index with around 30 types.

Each type has roughly 40000 - 60000

Say if i want to move each type to a new index and even though i set the shards to 1or 2(primary), wont it affect the overall system performance as there are too many shards and each will have its own contention for resources. Would it be a good design considering we have a single 3 ES servers with 16 GB ES_HEAP_SIZE and multi core cpu's.


(David Pilato) #10

I'd create one index per type with one single shard. So you will end up to 60 shards (including 1 replica) on 3 nodes, which is around 20 shards per node. It looks reasonable to me.

Note that we have been discussing for a while the possibility of removing types.

So with the one-type-per-index strategy, you will be ready for that :slight_smile:


(SK) #11

Nice. Thanks for the info.


(system) #12

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.