Hi, i have an index with 10 shards.
My application has 10 users.
I'd like to assign each shard to a specific user, avoiding the possible collisions in the hash function used in the formula that computes the shard number from the _routing parameter
shard_num = hash(_routing) % num_primary_shards
Is there something similar to the routing mechanism that redirects my requests to a specific shard number?
I think this doc might be helpful for you. There is also an Elastic blog post about custom routing that could be helpful.
Hi Thomas, thanks for your reply.
I also thought to use the routing(userId); the problem in this scenario is that the shard number is computed using a hash function, so i cannot be sure that the same shard is not assigned to 2 (or more) different users.
The preference parameter in the SearchAPI ,
specifying the value _shards:shardnumber, retrieves the documents from the shard shardnumber, but i cannot find anything similar in the DocumentAPI that inserts the document in the shard shardnumber
I see (_routing) uses the hash function either way like you pointed out.
What would be the intention of this use case? Maybe their is an alternative solution that would work based off the use case.
Given that an index with N shards will be used by exactly N users, I'd like to have a one to one correspondence between the users and the shards:
index(document, uid1) --> document is stored in shard n1 = getshard(uid1)
index(document, uid2) --> document is stored in shard n2 = getshard(uid2)
index(document, uidN) --> document is stored in shard nN = getshard(uidN)
and for each i, j s.t. (1 <= i <= N and 1 <= j <= N and i != j) , ni != nj
And of course there should be a way to search in the correct shard using the uid:
search(query, uid1) will search only in the shard n1 = getshard(uid1)
Why not set them up as separate indices? Why can multiple users not share a shard?
The biggest problem is the number of users and its growth in the future, and consequently the amount of resources that the separate indexes strategy would require
Why can users not share shards? If each user needs a separate shards (or two) this is still going to scale badly if users have small amounts of data.
It's a requirement: we want to separate user's data in the index
Having lots of small shards dedicated to specific users is unlikely to scale well whether they are directly linked to an index or part of larger indices the way you describe. If you have a small number of users I would recommend having separate indices per user. If you want this to scale to large number of users, I would recommend you reconsider having dedicated shards per customer.
I haven't found a way to send docs to a specific shard unfortunately. You pointed out two concerns,
The biggest problem is the number of users and its growth in the future, and consequently the amount of resources that the separate indexes strategy would require One of the solutions @Christian_Dahlqvist mentioned may work well for you and solve both problems. If you used a shared index, you could have an index capable of supporting a scaling number of users with each users documents residing on a single shard. You could then use aliases to give the perception of a single index per user. You have scalability as well as less resources being used since Elasticsearch would not be sending requests out to all the shards in the cluster, just the one shard for your user. You can read the Definitive Guide section dedicated to shared indexes here.
We'll explore the shared index solution with aliases. Thomas and Christian, thanks for the support.
I am not sure having hundreds of thousands of aliases will scale well either as they are all kept in the cluster state.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.