How to delete documents by term and timestamp range in elasticsearch 1.5.2?

vsadokhin · March 7, 2016, 4:23am

Hello Elasticsearch Support, Developers and Community Members,

I'm looking for a way to delete old documents from ES cluster indices like:

curl -XDELETE 'http://localhost:9200/myIndex-*/_query' -d '{
  "query" : {
    "term" : { "termName" : "termValue" },
    "filtered" : {
      "filter" : {
        "range" : {
          "@timestamp" : {
            "lt" : "now-30d"
          }
        }
      }
    }
  }
}'

I try to combine delete by query and ranges. The problem I met is "QueryParsingException request does not support filter". I wonder if I'm digging in the right direction. Note that I'm limited with elasticsearch 1.5.2 by AWS.

Thanks in advance,
Vasiliy

dadoonet · March 7, 2016, 6:06am

If your use case is really about dates, you should create one index per timeframe and then simply remove the old indices.

Much much more efficient (IO wise).

dadoonet · March 7, 2016, 6:07am

BTW, in 1.x, you need to use must instead of filter.

Look at the doc for 1.5.

vsadokhin · March 7, 2016, 6:42am

Thank you for your reply, David.

I actually managed to delete old documents with this query:

curl -XDELETE 'http://localhost:9200/myIndex-*/_query' -d '
{
  "query": {
    "filtered" : {
      "query" : {
        "term" : { "termName" : "termValue" }
      },
      "filter" : {
        "range" : { "@timestamp" : { "lt" : "now-30d" }}
      }
    }
  }
}'

The request I posted removed documents I exactly expected to be removed locally. I wonder if the query can crash our cluster on production because of amount of data we have. Is elasticsearch smart enough not to load everything in memory but simply delete a document that matches criteria?

Could you, please, give me more details about "must" instead of "filter"?

We actually have one index per day but the issue is that each index contains different documents. Some might be required later, others can be removed. So one thing that comes to mind is splitting existing per day index into two different indices - one for long-term data storing, another for data that should be removed after time. It's not convenient way for existing data as it requires to reindex everything new way so one time query I wrote above might be handy for existing data if you have gigabytes in ES cluster and it takes hours to reindex everything. I agree and understand that removing a whole index is wiser/more efficient but I need a way to remove documents from all existing indices at least for now.

Christian_Dahlqvist · March 7, 2016, 7:14am

If different types of data have different retention periods, they should be placed in different indices as this, as you point out, allows you to manage retention directly through deleting indices. As you are already using daily indices you can introduce this change for new indices. If you really need to delete data from the existing indices I guess reindexing or using delete-by-query might still be required.

Topic		Replies	Views
Delete by time range with ES 1.1 Elasticsearch	3	385	July 6, 2017
Delete by time range with ES v1.0 Elasticsearch	2	389	July 6, 2017
Delete documents from ES index_5.4 based on timestamp Elasticsearch	1	418	September 18, 2018
Delete documents by timestamp Elasticsearch	18	22194	August 3, 2017
Delete records within a timerange Elasticsearch	6	3882	July 5, 2017

How to delete documents by term and timestamp range in elasticsearch 1.5.2?

Related topics