Will the number of indices become a performance bottleneck?

wxywj1005 · December 16, 2018, 2:02pm

my case is that

users can only search their own data, and the scores of the results are only related to their own data
users won't have much data.

I want to create index per user. Will too more indices be a huge performace problem?

aravindputrevu · December 16, 2018, 2:40pm

It would not be right practice to create index per user. I recommend making user information part of each document and do queries with that particular user for better results.

Christian_Dahlqvist · December 16, 2018, 3:37pm

Having indices per user tends to scale badly, and having lots of small indices can waste a lot of resources and be very inefficient. As Aravind pointed out, having shared indices with the users id as a field you can filter on tends to scale and perform much better.

wxywj1005 · December 17, 2018, 2:43am

Maybe I should store data in less indices by routing userkey, but I found that inverse index is stored in units of shards in es doc, so the search result of a user may be affect by other users' data in the same shard.
I just want to know if it is worth creating indices per user to resolve this problem.

Christian_Dahlqvist · December 17, 2018, 6:18am

Having an index per user will work as long as the number of users is relatively small, but you are likely to run into problems as the number of indices grow, even if the total data volume is low. When having multiple users share an index they will as you point out potentially affect relevancy of each other as statistics are stored per shard.

One way to get the one index per user model to work a bit further might be to use multiple small clusters instead of one big one, but this can also be combersome.

wxywj1005 · December 19, 2018, 7:23am

Thanks Christian. I think routing aliases in indices by userid is my best choice , and now I have another problem that there are two aliases in an index, I want to move a document from one alias to another and not change the id. After the deleting and creating operations, I find I can search the document with index and the routing value is correct , but can't search it with the destination alias.

Christian_Dahlqvist · December 19, 2018, 7:35am

Why do you need to use aliases? Why not just add a user_id as a filter client side and run it against all shards?

wxywj1005 · December 19, 2018, 8:02am

Does the alias cause this problem instead that the same id is created repeatly?
Using alias for that I want to make it like creating indices per user for app dev. My case is that users can basically only search their own data, so data from the same user is routed to the same sharding is make senece to me and also to performance.
But if it's alias cause the problem, I think I should use routing&filter directly.

Christian_Dahlqvist · December 19, 2018, 8:36am

Having a huge amount of aliases can also cause problems as they do take up space on the cluster state. If you are using routing you need to delete and reindex data if you want to move it.

wxywj1005 · December 19, 2018, 8:47am

Okay, I probably understand.
Thanks a lot!

system · January 16, 2019, 8:47am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Multiple indices vs. routing Elasticsearch	5	799	July 6, 2017
How many indices can be created Elasticsearch	8	13993	August 14, 2018
Scalability and creating 1 index per user Elasticsearch	4	913	July 6, 2017
Index-Per-User Scale Elasticsearch	6	4715	July 6, 2017
Users data flow Elasticsearch	8	1282	July 6, 2017

Will the number of indices become a performance bottleneck?

Related topics