Delete by query deletes only 1000 documents, then quits

I am using the following the api to delete documents older than 60 days:

POST /index_name/_delete_by_query?conflicts=proceed
{
   "query": {
     "range": { 
      "@timestamp": {"lte": "now-60d/d"}
    }
  }
}

My index is quite big in size: 1.2tb. When I run this api, it deletes max 1000 documents and then quits.

Welcome.

That's really inefficient. A delete request actually writes more data on disk and eventually removes it.

Instead use time based indices and simply delete the indices you don't need anymore.
You can use ILM to automate all that.

See ILM: Manage the index lifecycle | Elasticsearch Guide [8.11] | Elastic

I know index deletion is faster. In fact we are planning to have daily indices in near future. However till then I have to remove old documents. Will you please help me optimize query if there is any way?

I think it'd be better to reindex the data you want to keep instead.

What is your version?

Version-7.17.

You can try adding

wait_for_completion=false

So it will run asynchronously.

I tried that and I got below result:

{
  "completed" : true,
  "task" : {
    "node" : "Z19SnYRVRTqf9G_kgaC4Yg",
    "id" : 1623007937,
    "type" : "transport",
    "action" : "indices:data/write/delete/byquery",
    "status" : {
      "total" : 18231930,
      "updated" : 0,
      "created" : 0,
      "deleted" : 1000,
      "batches" : 1,
      "version_conflicts" : 0,
      "noops" : 0,
      "retries" : {
        "bulk" : 0,
        "search" : 0
      },
      "throttled_millis" : 0,
      "requests_per_second" : -1.0,
      "throttled_until_millis" : 0
    },
    "description" : "delete-by-query [index-name]",
    "start_time_in_millis" : 1704633497441,
    "running_time_in_nanos" : 493323541261,
    "cancellable" : true,
    "cancelled" : false,
    "headers" : { }
  },
  "error" : {
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : -1,
        "index" : null,
        "reason" : {
          "type" : "search_context_missing_exception",
          "reason" : "No search context found for id [30575029]"
        }
      }
    ],
    "caused_by" : {
      "type" : "search_context_missing_exception",
      "reason" : "No search context found for id [30575029]"
    }
  }
}

Not sure what is happening on your cluster. Is it overloaded at the moment?

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v

If some outputs are too big, please share them on gist.github.com and link them here.

May be you could reduce the scroll_size to 100 and try again? Or increase scroll to 1m (I don't remember what the default value is).

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.