I'm using es python client and want to delete all documents matching a particular type. Currently i'm using helpers.scan to get all matching _id followed by issuing a bulk delete request like this:
Is it a one time operation or something you are planning to do often ?
If the former, just run it in the Kibana dev console.
If the later, please be aware that with 6.0, you can have only one type per index so removing a type is actually removing an index.
Finally you must know that it just removes the documents not the mapping and the way elasticsearch works it can be a costly operation. Often better to reindex without the type.
The system design in my case is such that each type is a date. Moving forward a new type is created each day and thus each document has its copy in each type.
Now the reason i want to delete docs from a type is cleanup. Basically in case of intra day updates, i'd like to drop the previous copy of all docs and recreate that type. I'm obviously not bothered about the mapping in this case and just want to cleanup the documents. This has to be done multiple times in a day resulting in deletion of around 1000 documents each time. Please note that i'm using Elasticsearch==2.4.1
We have index per customer and then time series data is split across types. Do you mean each index should be per customer per day? That'd lead to huge number of indices as we've data worth 3-4 years - approximately 50K indices !
Filtered alias! i see. This might just work in our case. Does an alias function exactly the same as an index in terms of search or scroll? Can i just replace index functionality with alias and expect everything to fall in place?
Also, how would filtered alias help me in cleanup i was talking about? When you delete an alias, the data is still retained in the base index. How do i get rid of it?
The way i'm issuing a bulk request to cleanup a particular type, is it considered a good practice? I saw some anomalies in production wrt this method as data was not getting deleted completely.
The way delete documents works will most likely increase first your data volume then eventually give you back your disk space after some IO intensive merge operation.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.