Delete by query deletes only 1000 documents, then quits

d8d4a522fb1d394d9705 · January 7, 2024, 11:15am

I am using the following the api to delete documents older than 60 days:

POST /index_name/_delete_by_query?conflicts=proceed
{
   "query": {
     "range": { 
      "@timestamp": {"lte": "now-60d/d"}
    }
  }
}

My index is quite big in size: 1.2tb. When I run this api, it deletes max 1000 documents and then quits.

dadoonet · January 7, 2024, 11:51am

Welcome.

That's really inefficient. A delete request actually writes more data on disk and eventually removes it.

Instead use time based indices and simply delete the indices you don't need anymore.
You can use ILM to automate all that.

See ILM: Manage the index lifecycle | Elasticsearch Guide [8.11] | Elastic

d8d4a522fb1d394d9705 · January 7, 2024, 12:17pm

I know index deletion is faster. In fact we are planning to have daily indices in near future. However till then I have to remove old documents. Will you please help me optimize query if there is any way?

dadoonet · January 7, 2024, 12:38pm

I think it'd be better to reindex the data you want to keep instead.

What is your version?

d8d4a522fb1d394d9705 · January 7, 2024, 12:52pm

Version-7.17.

dadoonet · January 7, 2024, 1:16pm

You can try adding

wait_for_completion=false

So it will run asynchronously.

d8d4a522fb1d394d9705 · January 7, 2024, 1:47pm

I tried that and I got below result:

{
  "completed" : true,
  "task" : {
    "node" : "Z19SnYRVRTqf9G_kgaC4Yg",
    "id" : 1623007937,
    "type" : "transport",
    "action" : "indices:data/write/delete/byquery",
    "status" : {
      "total" : 18231930,
      "updated" : 0,
      "created" : 0,
      "deleted" : 1000,
      "batches" : 1,
      "version_conflicts" : 0,
      "noops" : 0,
      "retries" : {
        "bulk" : 0,
        "search" : 0
      },
      "throttled_millis" : 0,
      "requests_per_second" : -1.0,
      "throttled_until_millis" : 0
    },
    "description" : "delete-by-query [index-name]",
    "start_time_in_millis" : 1704633497441,
    "running_time_in_nanos" : 493323541261,
    "cancellable" : true,
    "cancelled" : false,
    "headers" : { }
  },
  "error" : {
    "type" : "search_phase_execution_exception",
    "reason" : "all shards failed",
    "phase" : "query",
    "grouped" : true,
    "failed_shards" : [
      {
        "shard" : -1,
        "index" : null,
        "reason" : {
          "type" : "search_context_missing_exception",
          "reason" : "No search context found for id [30575029]"
        }
      }
    ],
    "caused_by" : {
      "type" : "search_context_missing_exception",
      "reason" : "No search context found for id [30575029]"
    }
  }
}

dadoonet · January 8, 2024, 9:14am

Not sure what is happening on your cluster. Is it overloaded at the moment?

What is the output of:

GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v

If some outputs are too big, please share them on gist.github.com and link them here.

May be you could reduce the scroll_size to 100 and try again? Or increase scroll to 1m (I don't remember what the default value is).

system · February 5, 2024, 9:14am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
When deleting by query api all my documents get deleted Elasticsearch	3	457	February 9, 2018
Delete by query took more time than expected Elasticsearch	13	2204	August 23, 2018
Delete vs deletebyquery Elasticsearch	8	1482	July 25, 2022
Alternative for Delete By Query or Solution for Performance improvement Elasticsearch	6	1944	June 4, 2021
Slow deletes Elasticsearch	5	3298	July 5, 2017

Delete by query deletes only 1000 documents, then quits

Related topics