Python - Delete all docs from a type

Elasticsearch version: 2.4.1

I'm using es python client and want to delete all documents matching a particular type. Currently i'm using helpers.scan to get all matching _id followed by issuing a bulk delete request like this:

bulk_body = ['{"delete": {"_index": "", "_type": "2017-01-01", "_id": "1"}}',
'{"delete": {"_index": "", "_type": "2017-01-01", "_id": "2"}}',
'{"delete": {"_index": "", "_type": "2017-01-01", "_id": "3"}}']

es.bulk(bulk_body, request_timeout=50)

This doesn't look very efficient to me as the whole operation is done on client side which should be handled by es server.

On inspection i found out about delete-by-query api which is implemented using the delete by query plugin.

Now i'm unable to find a pythonic way to achieve the above functionality. This has already been answered here but via a curl request.

Is it a one time operation or something you are planning to do often ?

If the former, just run it in the Kibana dev console.
If the later, please be aware that with 6.0, you can have only one type per index so removing a type is actually removing an index.

Finally you must know that it just removes the documents not the mapping and the way elasticsearch works it can be a costly operation. Often better to reindex without the type.

The system design in my case is such that each type is a date. Moving forward a new type is created each day and thus each document has its copy in each type.

Now the reason i want to delete docs from a type is cleanup. Basically in case of intra day updates, i'd like to drop the previous copy of all docs and recreate that type. I'm obviously not bothered about the mapping in this case and just want to cleanup the documents. This has to be done multiple times in a day resulting in deletion of around 1000 documents each time. Please note that i'm using Elasticsearch==2.4.1

That's a bad practice.
Use time based indices.

We have index per customer and then time series data is split across types. Do you mean each index should be per customer per day? That'd lead to huge number of indices as we've data worth 3-4 years - approximately 50K indices !

Or put all your customers inside the same index, create filtered aliases per customer.

1 Like

Filtered alias! i see. This might just work in our case. Does an alias function exactly the same as an index in terms of search or scroll? Can i just replace index functionality with alias and expect everything to fall in place?
Also, how would filtered alias help me in cleanup i was talking about? When you delete an alias, the data is still retained in the base index. How do i get rid of it?

The way i'm issuing a bulk request to cleanup a particular type, is it considered a good practice? I saw some anomalies in production wrt this method as data was not getting deleted completely.

If you have time based indices, cleaning old data by dropping the index is perfect.

No. It's not recommended.

1 Like

Any particular reason why its not recommended? Its not there in elasticsearch documentation.

The way delete documents works will most likely increase first your data volume then eventually give you back your disk space after some IO intensive merge operation.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.