ES 1.5 Delete By Query API not working

Here is the original question asked on Stackoverflow:

I am using an old version on ElasticSearch - 1.5.

Problem: I need to delete a lot of documents, like few hundred thousands up to few millions. I have all the info about the records, including it's _ids - so array of _ids is what I want to use.

Scale problem: I had this deletion in the loop before, but ES is inconsistent when performing a lot of subsequent operations in a high speed. Thus I decided to look for a bulk delete.

I am trying to make use of delete by query API.

Docs states:

curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
    "query" : {
        "term" : { "user" : "kimchy" }

What I'm doing:

curl -XDELETE 'http://localhost:9200/my_index/logs/_query' -d '{
  "query" : {
    "terms" : { "_id" : ["AVTD6fhLAn35BG25xbZz", "AVTD6fhLAn35BG25xbaC"] }

The response is:

  "_shards":{"total":2, "successful":1, "failed":0}

It looks like ES is confused because it seems to be looking for a document with _id of _query... Very strange.

And it does not remove any of documents. How do I make it work and actually delete these records?

If you are deleting by id I would recommend usi8ng the bulk API instead. Issuing a delete by query operation for a small set of unique ids does not sound very efficient to me.

Thanks for the reply! I will look into bulk api, but the deleting by query looks the most natural to me - why do you see this approach inefficient?

If you wanted to delete based on the results of a query, it would be the natural choice. As you have the document ids, using the bulk API to explicitly delete multiple documents at a time is more natural in my opinion. This approach can also be parallelised and has minimal overhead.

I am trying the following:

bulk_delete = "{\":delete\":{\":_index\":\"my_index\",\":_type\":\"logs\",\":_id\":\"AVTD6fhLAn35BG25xbZz\"}}\n""http://elastic_host/my_index/logs/_bulk",  body: bulk_delete)

And the response I get is as follows:

#=> {"found"=>false, "_index"=>"my_index", "_type"=>"logs", "_id"=>"_bulk", "_version"=>1}

HTTP/1.1 404 Not Found
Access-Control-Allow-Origin: *
Content-Type: application/json; charset=UTF-8
Content-Length: 108
Connection: keep-alive
#=> 404

I don't see where I go wrong about it.. Thank you very much for taking a look!

Why does it say :delete, :_index, :_type and :_id in your bulk_delete string?

Should it not?

It is how I understood it from docs, - I have to provide a method (delete), index, type and id of the object to delete, no?

{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }

The example from the docs is correct, but your example looks wrong (see example above). You seem to have a few too many : in your string.

Oh, I see what you mean.

I tried the following now:

bulk_delete = "{ \"delete\" : { \"_index\" : \"my_index\", \"_type\" : \"logs\", \"_id\" : \"AV_v9fh4g0tG8Fm0jtzS\" } }"
puts bulk_delete
#=> { "delete" : { "_index" : "my_index", "_type" : "logs", "_id" : "AV_v9fh4g0tG8Fm0jtzS" } }"http://#{AppConfig.elastic_host}/my_index/logs/_bulk", body: bulk_delete)

and the response code is 400 with the following info:

{"error":"ActionRequestValidationException[Validation Failed: 1: no requests added;]","status":400}

You need a newline after each bulk action, even the last one. Thy this:

curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d'
{ "delete" : { "_index" : "my_index", "_type" : "logs", "_id" : "AV_v9fh4g0tG8Fm0jtzS" } }
1 Like

Changed my payload to the following (added new line char to the end)

"{ \"delete\" : { \"_index\" : \"my_index\", \"_type\" : \"logs\", \"_id\" : \"AV_v9fh4g0tG8Fm0jtzS\" } }\n"

and worked like charm.

Thank you so much Christian for your involvement and help - very much appreciated!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.