Python - Delete all docs from a type

(Jai Sharma) #1

Elasticsearch version: 2.4.1

I'm using es python client and want to delete all documents matching a particular type. Currently i'm using helpers.scan to get all matching _id followed by issuing a bulk delete request like this:

bulk_body = ['{"delete": {"_index": "", "_type": "2017-01-01", "_id": "1"}}',
'{"delete": {"_index": "", "_type": "2017-01-01", "_id": "2"}}',
'{"delete": {"_index": "", "_type": "2017-01-01", "_id": "3"}}']

es.bulk(bulk_body, request_timeout=50)

This doesn't look very efficient to me as the whole operation is done on client side which should be handled by es server.

On inspection i found out about delete-by-query api which is implemented using the delete by query plugin.

Now i'm unable to find a pythonic way to achieve the above functionality. This has already been answered here but via a curl request.

(David Pilato) #2

Is it a one time operation or something you are planning to do often ?

If the former, just run it in the Kibana dev console.
If the later, please be aware that with 6.0, you can have only one type per index so removing a type is actually removing an index.

Finally you must know that it just removes the documents not the mapping and the way elasticsearch works it can be a costly operation. Often better to reindex without the type.

(Jai Sharma) #3

The system design in my case is such that each type is a date. Moving forward a new type is created each day and thus each document has its copy in each type.

Now the reason i want to delete docs from a type is cleanup. Basically in case of intra day updates, i'd like to drop the previous copy of all docs and recreate that type. I'm obviously not bothered about the mapping in this case and just want to cleanup the documents. This has to be done multiple times in a day resulting in deletion of around 1000 documents each time. Please note that i'm using Elasticsearch==2.4.1

(David Pilato) #4

That's a bad practice.
Use time based indices.

(Jai Sharma) #5

We have index per customer and then time series data is split across types. Do you mean each index should be per customer per day? That'd lead to huge number of indices as we've data worth 3-4 years - approximately 50K indices !

(David Pilato) #6

Or put all your customers inside the same index, create filtered aliases per customer.

(Jai Sharma) #7

Filtered alias! i see. This might just work in our case. Does an alias function exactly the same as an index in terms of search or scroll? Can i just replace index functionality with alias and expect everything to fall in place?
Also, how would filtered alias help me in cleanup i was talking about? When you delete an alias, the data is still retained in the base index. How do i get rid of it?

(Jai Sharma) #8

The way i'm issuing a bulk request to cleanup a particular type, is it considered a good practice? I saw some anomalies in production wrt this method as data was not getting deleted completely.

(David Pilato) #9

If you have time based indices, cleaning old data by dropping the index is perfect.

(David Pilato) #10

No. It's not recommended.

(Jai Sharma) #11

Any particular reason why its not recommended? Its not there in elasticsearch documentation.

(David Pilato) #12

The way delete documents works will most likely increase first your data volume then eventually give you back your disk space after some IO intensive merge operation.

(system) #13

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.