How to delete documents by term and timestamp range in elasticsearch 1.5.2?


(Vasiliy) #1

Hello Elasticsearch Support, Developers and Community Members,

I'm looking for a way to delete old documents from ES cluster indices like:

curl -XDELETE 'http://localhost:9200/myIndex-*/_query' -d '{
  "query" : {
    "term" : { "termName" : "termValue" },
    "filtered" : {
      "filter" : {
        "range" : {
          "@timestamp" : {
            "lt" : "now-30d"
          }
        }
      }
    }
  }
}'

I try to combine delete by query and ranges. The problem I met is "QueryParsingException request does not support filter". I wonder if I'm digging in the right direction. Note that I'm limited with elasticsearch 1.5.2 by AWS.

Thanks in advance,
Vasiliy


(David Pilato) #2

If your use case is really about dates, you should create one index per timeframe and then simply remove the old indices.

Much much more efficient (IO wise).


(David Pilato) #3

BTW, in 1.x, you need to use must instead of filter.

Look at the doc for 1.5.


(Vasiliy) #4

Thank you for your reply, David.

I actually managed to delete old documents with this query:

curl -XDELETE 'http://localhost:9200/myIndex-*/_query' -d '
{
  "query": {
    "filtered" : {
      "query" : {
        "term" : { "termName" : "termValue" }
      },
      "filter" : {
        "range" : { "@timestamp" : { "lt" : "now-30d" }}
      }
    }
  }
}'

The request I posted removed documents I exactly expected to be removed locally. I wonder if the query can crash our cluster on production because of amount of data we have. Is elasticsearch smart enough not to load everything in memory but simply delete a document that matches criteria?

Could you, please, give me more details about "must" instead of "filter"?

We actually have one index per day but the issue is that each index contains different documents. Some might be required later, others can be removed. So one thing that comes to mind is splitting existing per day index into two different indices - one for long-term data storing, another for data that should be removed after time. It's not convenient way for existing data as it requires to reindex everything new way so one time query I wrote above might be handy for existing data if you have gigabytes in ES cluster and it takes hours to reindex everything. I agree and understand that removing a whole index is wiser/more efficient but I need a way to remove documents from all existing indices at least for now.


(Christian Dahlqvist) #5

If different types of data have different retention periods, they should be placed in different indices as this, as you point out, allows you to manage retention directly through deleting indices. As you are already using daily indices you can introduce this change for new indices. If you really need to delete data from the existing indices I guess reindexing or using delete-by-query might still be required.


(system) #6