1. We created an index (routing_test) => (with 2 Shards)
2. We created a couple of aliases on that index (routing_test_100 and routing_test_200) using:
PUT routing_test/_alias/routing_test_100
{
"routing": "100",
"filter" : {
"term" : {
"data.Id" : "100"
}
}
}
3. After creating the aliases, both the aliases are pointing to the same shard (found using _search_shards) query.
4. I indexed a document with ID 123 and data.Id as 100. And here are the scenarios I'm observing:
GET routing_test/_doc/123 => No results (Expected: Should return a document)
GET routing_test_200/_doc/123 => Found the doc (Expected: Should not return a document)
GET routing_test_100/_doc/123 => Found the doc (Expected: Should return a document)
I am expecting that I should be able to get the document in the original index irrespective of any condition. Also, It should not return a document when I'm querying a different alias. Am I missing something or doing something wrong here?
If you do not provide a routing key when you GET the focument from the index the I'd will be hashed and the shard determined by this. It is therefore possible that the GET request will not find the document if the routing value resulted in a different shard when it was indexed.
In order to reliably get the document from the index without using the correct routing value you would need to run a search, which naturally is less efficient than a GET and require the document to be searchable.
As the aliases result in the same underlying shard they will always return the same documents. There is no filtering based on used routing value taking place, just data localisation.
What you are seeing is therefore expected.
What is the problem you are trying to solve using these features?
Is there a syntax to GET document by Id (_doc) with routing key? That is what we are trying to attempt by using the alias.
Re: business case We are trying to partition data (read) within an index and apply access control (aka limiting data visibility) without having to create separate indexes which adds a lot of maintenance).
Yes, you can provide a routing key to a GET request. Using this for access control will however not work as it does not include any filter and just co-locates related data. If you have 2 shards an average of half the data will be visible for any routing value irrespective of how many routing values you have in use. You will need to add filtering based on some field in the documents to restrict access.
The best way to control access within an index, e.g. in a multi-tenant environment, is document level security, which is a commercial feature.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.