Python - Delete all docs from a type

jaisharma · February 23, 2018, 5:46am

Elasticsearch version: 2.4.1

I'm using es python client and want to delete all documents matching a particular type. Currently i'm using helpers.scan to get all matching _id followed by issuing a bulk delete request like this:

bulk_body = ['{"delete": {"_index": "talend.com", "_type": "2017-01-01", "_id": "1"}}',
'{"delete": {"_index": "talend.com", "_type": "2017-01-01", "_id": "2"}}',
'{"delete": {"_index": "talend.com", "_type": "2017-01-01", "_id": "3"}}']

es.bulk(bulk_body, request_timeout=50)

This doesn't look very efficient to me as the whole operation is done on client side which should be handled by es server.

On inspection i found out about delete-by-query api which is implemented using the delete by query plugin.

Now i'm unable to find a pythonic way to achieve the above functionality. This has already been answered here but via a curl request.
`

dadoonet · February 23, 2018, 6:36am

Is it a one time operation or something you are planning to do often ?

If the former, just run it in the Kibana dev console.
If the later, please be aware that with 6.0, you can have only one type per index so removing a type is actually removing an index.

Finally you must know that it just removes the documents not the mapping and the way elasticsearch works it can be a costly operation. Often better to reindex without the type.

jaisharma · February 23, 2018, 6:52am

The system design in my case is such that each type is a date. Moving forward a new type is created each day and thus each document has its copy in each type.

Now the reason i want to delete docs from a type is cleanup. Basically in case of intra day updates, i'd like to drop the previous copy of all docs and recreate that type. I'm obviously not bothered about the mapping in this case and just want to cleanup the documents. This has to be done multiple times in a day resulting in deletion of around 1000 documents each time. Please note that i'm using Elasticsearch==2.4.1

dadoonet · February 23, 2018, 7:05am

That's a bad practice.
Use time based indices.

jaisharma · February 23, 2018, 7:07am

We have index per customer and then time series data is split across types. Do you mean each index should be per customer per day? That'd lead to huge number of indices as we've data worth 3-4 years - approximately 50K indices !

dadoonet · February 23, 2018, 7:30am

Or put all your customers inside the same index, create filtered aliases per customer.

jaisharma · February 27, 2018, 6:05am

Filtered alias! i see. This might just work in our case. Does an alias function exactly the same as an index in terms of search or scroll? Can i just replace index functionality with alias and expect everything to fall in place?
Also, how would filtered alias help me in cleanup i was talking about? When you delete an alias, the data is still retained in the base index. How do i get rid of it?

jaisharma · February 27, 2018, 6:10am

The way i'm issuing a bulk request to cleanup a particular type, is it considered a good practice? I saw some anomalies in production wrt this method as data was not getting deleted completely.

dadoonet · February 27, 2018, 2:02pm

If you have time based indices, cleaning old data by dropping the index is perfect.

dadoonet · February 27, 2018, 2:02pm

No. It's not recommended.

jaisharma · March 8, 2018, 9:20am

Any particular reason why its not recommended? Its not there in elasticsearch documentation.

dadoonet · March 8, 2018, 11:00am

The way delete documents works will most likely increase first your data volume then eventually give you back your disk space after some IO intensive merge operation.

system · April 5, 2018, 11:00am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Deleting all document from an index type Elasticsearch	3	17286	July 5, 2017
Straight forward way to delete all doc Elasticsearch	2	506	September 8, 2017
Delete document from elastic search Elasticsearch	4	741	November 29, 2017
Delete all from a type using ElasticSearch.net Elasticsearch	5	1162	July 5, 2017
How do I delete a single type, rather than a whole index? Elasticsearch	5	815	July 5, 2017

Python - Delete all docs from a type

Related topics