Here is the original question asked on Stackoverflow: elasticsearch - ES 1.5 Delete By Query API not working - Stack Overflow
I am using an old version on Elasticsearch - 1.5.
Problem: I need to delete a lot of documents, like few hundred thousands up to few millions. I have all the info about the records, including it's _id
s - so array of _id
s is what I want to use.
Scale problem: I had this deletion in the loop before, but ES is inconsistent when performing a lot of subsequent operations in a high speed. Thus I decided to look for a bulk delete.
I am trying to make use of delete by query API .
Docs states:
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"term" : { "user" : "kimchy" }
}
}
'
What I'm doing:
curl -XDELETE 'http://localhost:9200/my_index/logs/_query' -d '{
"query" : {
"terms" : { "_id" : ["AVTD6fhLAn35BG25xbZz", "AVTD6fhLAn35BG25xbaC"] }
}
}
'
The response is:
{
"found":false,
"_index":"my_index",
"_type":"logs",
"_id":"_query",
"_version":1,
"_shards":{"total":2, "successful":1, "failed":0}
}
It looks like ES is confused because it seems to be looking for a document with _id
of _query
... Very strange.
And it does not remove any of documents. How do I make it work and actually delete these records?
If you are deleting by id I would recommend usi8ng the bulk API instead. Issuing a delete by query operation for a small set of unique ids does not sound very efficient to me.
Thanks for the reply! I will look into bulk api, but the deleting by query looks the most natural to me - why do you see this approach inefficient?
If you wanted to delete based on the results of a query, it would be the natural choice. As you have the document ids, using the bulk API to explicitly delete multiple documents at a time is more natural in my opinion. This approach can also be parallelised and has minimal overhead.
I am trying the following:
bulk_delete = "{\":delete\":{\":_index\":\"my_index\",\":_type\":\"logs\",\":_id\":\"AVTD6fhLAn35BG25xbZz\"}}\n"
Typhoeus.post("http://elastic_host/my_index/logs/_bulk", body: bulk_delete)
And the response I get is as follows:
response.body
#=> {"found"=>false, "_index"=>"my_index", "_type"=>"logs", "_id"=>"_bulk", "_version"=>1}
response.response_headers
HTTP/1.1 404 Not Found
Access-Control-Allow-Origin: *
Content-Type: application/json; charset=UTF-8
Content-Length: 108
Connection: keep-alive
response.code
#=> 404
I don't see where I go wrong about it.. Thank you very much for taking a look!
Why does it say :delete
, :_index
, :_type
and :_id
in your bulk_delete string?
Should it not?
It is how I understood it from docs, - I have to provide a method (delete
), index, type and id of the object to delete, no?
https://www.elastic.co/guide/en/elasticsearch/reference/1.5/docs-bulk.html:
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
Andrey_Deineko:
":delete"
The example from the docs is correct, but your example looks wrong (see example above). You seem to have a few too many :
in your string.
Oh, I see what you mean.
I tried the following now:
bulk_delete = "{ \"delete\" : { \"_index\" : \"my_index\", \"_type\" : \"logs\", \"_id\" : \"AV_v9fh4g0tG8Fm0jtzS\" } }"
puts bulk_delete
#=> { "delete" : { "_index" : "my_index", "_type" : "logs", "_id" : "AV_v9fh4g0tG8Fm0jtzS" } }
Typhoeus.post("http://#{AppConfig.elastic_host}/my_index/logs/_bulk", body: bulk_delete)
and the response code is 400
with the following info:
{"error":"ActionRequestValidationException[Validation Failed: 1: no requests added;]","status":400}
You need a newline after each bulk action, even the last one. Thy this:
curl -XPOST 'localhost:9200/_bulk?pretty' -H 'Content-Type: application/json' -d'
{ "delete" : { "_index" : "my_index", "_type" : "logs", "_id" : "AV_v9fh4g0tG8Fm0jtzS" } }
'
1 Like
Changed my payload to the following (added new line char to the end)
"{ \"delete\" : { \"_index\" : \"my_index\", \"_type\" : \"logs\", \"_id\" : \"AV_v9fh4g0tG8Fm0jtzS\" } }\n"
and worked like charm.
Thank you so much Christian for your involvement and help - very much appreciated!
system
(system)
Closed
January 1, 2018, 9:09am
12
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.