Delete large amount of data using _delete_by_query

Hello,

I am trying to remove a large amount of data from elasticsearch using _delete_by_query. I've tried many options to get this to complete, but typically can only get several hundred (out of millions) of records to actually delete. The most common error is:

{
  "took": 3307,
  "timed_out": false,
  "total": 140739907,
  "deleted": 102,
  "batches": 1,
  "version_conflicts": 77,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": [
    {
      "index": "logstash-2017.02.28",
      "type": "fluentd",
      "id": "AVqCEZwbJhYoxRiSZL1-",
      "cause": {
        "type": "es_rejected_execution_exception",
        "reason": "rejected execution of org.elasticsearch.transport.TransportService$7@1ac5f094 on EsThreadPoolExecutor[bulk, queue capacity = 50, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@795ac56a[Running, pool size = 2, active threads = 2, queued tasks = 50, completed tasks = 464576]]"
      },
      "status": 429
    }

Here is an example of a query. I've tried many different settings to no avail.

Anyone know how to get this to just slowly crawl through and remove all matching records, versus erroring out?

GET logstash-*/_delete_by_query?conflicts=proceed
{
  "query": { 
    "query_string": {
      "default_field": "log",
      "analyze_wildcard": true, 
      "query": "DEV-*"
    }
  }
}

Could you try to run the query per index instead of using a wildcard ?

FWIW if you end up removing a lot of docs it could be better to reindex the documents which will remain instead.

Thanks @dadoonet -- I'll give that a try and report back :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.