I am using the following the api to delete documents older than 60 days:
POST /index_name/_delete_by_query?conflicts=proceed
{
"query": {
"range": {
"@timestamp": {"lte": "now-60d/d"}
}
}
}
My index is quite big in size: 1.2tb. When I run this api, it deletes max 1000 documents and then quits.
dadoonet
(David Pilato)
January 7, 2024, 11:51am
2
Welcome.
That's really inefficient. A delete request actually writes more data on disk and eventually removes it.
Instead use time based indices and simply delete the indices you don't need anymore.
You can use ILM to automate all that.
See ILM: Manage the index lifecycle | Elasticsearch Guide [8.11] | Elastic
I know index deletion is faster. In fact we are planning to have daily indices in near future. However till then I have to remove old documents. Will you please help me optimize query if there is any way?
dadoonet
(David Pilato)
January 7, 2024, 12:38pm
4
I think it'd be better to reindex the data you want to keep instead.
What is your version?
dadoonet
(David Pilato)
January 7, 2024, 1:16pm
6
You can try adding
wait_for_completion=false
So it will run asynchronously.
I tried that and I got below result:
{
"completed" : true,
"task" : {
"node" : "Z19SnYRVRTqf9G_kgaC4Yg",
"id" : 1623007937,
"type" : "transport",
"action" : "indices:data/write/delete/byquery",
"status" : {
"total" : 18231930,
"updated" : 0,
"created" : 0,
"deleted" : 1000,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0
},
"description" : "delete-by-query [index-name]",
"start_time_in_millis" : 1704633497441,
"running_time_in_nanos" : 493323541261,
"cancellable" : true,
"cancelled" : false,
"headers" : { }
},
"error" : {
"type" : "search_phase_execution_exception",
"reason" : "all shards failed",
"phase" : "query",
"grouped" : true,
"failed_shards" : [
{
"shard" : -1,
"index" : null,
"reason" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [30575029]"
}
}
],
"caused_by" : {
"type" : "search_context_missing_exception",
"reason" : "No search context found for id [30575029]"
}
}
}
dadoonet
(David Pilato)
January 8, 2024, 9:14am
8
Not sure what is happening on your cluster. Is it overloaded at the moment?
What is the output of:
GET /
GET /_cat/nodes?v
GET /_cat/health?v
GET /_cat/indices?v
If some outputs are too big, please share them on gist.github.com and link them here.
May be you could reduce the scroll_size
to 100 and try again? Or increase scroll
to 1m
(I don't remember what the default value is).
system
(system)
Closed
February 5, 2024, 9:14am
9
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.