Delete from all indexes using the Bulk API


(Simon Hutchinson) #1

Hi All,

My name is Si and I am new to Elastic Search, so please be patient with my
naivety.
Firstly, congratulations to Shay and the ES community. So far the product
looks fantastic. We evaluated SolrCloud before ES and ES simply destroyed
Solr for the multi-tennant, EC2 deployment we are building.

Anyway, enough flattery and on to my issue.

I am using the rabbit-mq-river along with some custom code which tails the
oplog to index our MongoDB data in ES via RabbitMQ.

Each document in our Mongo db contains the information required (index,
type and id) to allow us to build the bulk API format used by the river:

e.g.

{ "index" : { "_index" : "someco", "_type" : "product", "_id" : "1" } }

However in the case of a Delete from Mongo, I only have access to the _id.
We have ensured that our _ids are universally unique and therefore I was
hoping to be able to simply delete from all indexes in a similar way to the
REST Delete by Query API:

e.g

curl -XDELETE 'http://localhost:9200/_all/_query?q=id:1

However a brief look at the source of
org.elasticsearch.action.delete.DeleteRequest suggests that this isn't
possible using the bulk API.

I therefore have 1 or 2 questions.

  1. Am I correct that the bulk API format doesn't support deletion from all
    indexes?

If the answer to 1. is true.

  1. Could someone give me some hints on where to look in the source for some
    hints on how to develop this functionality (delete by id from all indexes)
    with a customized version of the rabbit-mq-river.

Best regards

Si


(Shay Banon) #2

You can use the delete_by_query API, but thats not exposed in the bulk API
format. I suggest you use direct API calls to ES and not use the rabbitmq
river, or you tail rabbitmq yourself and call ES.

On Fri, Dec 16, 2011 at 3:33 PM, Simon Hutchinson si@springyweb.com wrote:

Hi All,

My name is Si and I am new to Elastic Search, so please be patient with my
naivety.
Firstly, congratulations to Shay and the ES community. So far the product
looks fantastic. We evaluated SolrCloud before ES and ES simply destroyed
Solr for the multi-tennant, EC2 deployment we are building.

Anyway, enough flattery and on to my issue.

I am using the rabbit-mq-river along with some custom code which tails the
oplog to index our MongoDB data in ES via RabbitMQ.

Each document in our Mongo db contains the information required (index,
type and id) to allow us to build the bulk API format used by the river:

e.g.

{ "index" : { "_index" : "someco", "_type" : "product", "_id" : "1" } }

However in the case of a Delete from Mongo, I only have access to the _id.
We have ensured that our _ids are universally unique and therefore I was
hoping to be able to simply delete from all indexes in a similar way to the
REST Delete by Query API:

e.g

curl -XDELETE 'http://localhost:9200/_all/_query?q=id:1

However a brief look at the source of
org.elasticsearch.action.delete.DeleteRequest suggests that this isn't
possible using the bulk API.

I therefore have 1 or 2 questions.

  1. Am I correct that the bulk API format doesn't support deletion from all
    indexes?

If the answer to 1. is true.

  1. Could someone give me some hints on where to look in the source for
    some hints on how to develop this functionality (delete by id from all
    indexes) with a customized version of the rabbit-mq-river.

Best regards

Si


(system) #3