I try to combine delete by query and ranges. The problem I met is "QueryParsingException request does not support filter". I wonder if I'm digging in the right direction. Note that I'm limited with elasticsearch 1.5.2 by AWS.
The request I posted removed documents I exactly expected to be removed locally. I wonder if the query can crash our cluster on production because of amount of data we have. Is elasticsearch smart enough not to load everything in memory but simply delete a document that matches criteria?
Could you, please, give me more details about "must" instead of "filter"?
We actually have one index per day but the issue is that each index contains different documents. Some might be required later, others can be removed. So one thing that comes to mind is splitting existing per day index into two different indices - one for long-term data storing, another for data that should be removed after time. It's not convenient way for existing data as it requires to reindex everything new way so one time query I wrote above might be handy for existing data if you have gigabytes in ES cluster and it takes hours to reindex everything. I agree and understand that removing a whole index is wiser/more efficient but I need a way to remove documents from all existing indices at least for now.
If different types of data have different retention periods, they should be placed in different indices as this, as you point out, allows you to manage retention directly through deleting indices. As you are already using daily indices you can introduce this change for new indices. If you really need to delete data from the existing indices I guess reindexing or using delete-by-query might still be required.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.