Updating routing values on documents


(Brent Evans) #1

Some documents that I index may need to have the routing of them updated dependent on user operations etc. To do this I'm detecting if the routing has changed, deleting the document and then re-indexing with the new routing value.

I'm now experiencing a problem whereby deleting a document and indexing under a new routing value will still cause a GET using the previous routing value to return the document. I'm not 100% on this, but I'm sure this wasn't the case in ES 1.x versions and has only started happening since 2.0.

Example:

PUT test-index

PUT test-index/test/_mapping 
{
  "_routing": { "required": true}, 
  "properties": {
    "message" : {"type" : "string"}
  }
}

PUT test-index/test/test01?routing=test01
{
  "message" : "test01"
}

GET test-index/test/test01?routing=test01

DELETE test-index/test/test01?routing=test01

GET test-index/test/test01?routing=test01

PUT test-index/test/test01?routing=newrouting
{
  "message" : "test01"
}

GET test-index/test/test01?routing=test01
GET test-index/test/test01?routing=newrouting

The last two gets will both return the document;

{
  "_index": "test-index",
  "_type": "test",
  "_id": "test01",
  "_version": 3,
  "_routing": "newrouting",
  "found": true,
  "_source": {
    "message": "test01"
  }
}

I can get round this by manually checking the _routing field when the document is returned but just thought I'd check if this was expected or indeed a bug before I raise a ticket?

Thanks,

Brent


(Adrien Grand) #2

In 2.0 we migrated from djb2 to murmur3 when hashing the routing value in order to get more even routing. What is happening here is that "test01" and "newrouting" resolve to the same shard id when there are 5 shards (with only 5 possible, collisions are likely). This is not a problem, and actually happens quite frequently, for one fifth (assuming 5 shards) of the routing values that you plan on changing. It happened on 1.x as well, but on different routing keys.


(system) #3