my case is that
- users can only search their own data, and the scores of the results are only related to their own data
- users won't have much data.
I want to create index per user. Will too more indices be a huge performace problem?
my case is that
I want to create index per user. Will too more indices be a huge performace problem?
It would not be right practice to create index per user. I recommend making user information part of each document and do queries with that particular user for better results.
Having indices per user tends to scale badly, and having lots of small indices can waste a lot of resources and be very inefficient. As Aravind pointed out, having shared indices with the users id as a field you can filter on tends to scale and perform much better.
Maybe I should store data in less indices by routing userkey, but I found that inverse index is stored in units of shards in es doc, so the search result of a user may be affect by other users' data in the same shard.
I just want to know if it is worth creating indices per user to resolve this problem.
Having an index per user will work as long as the number of users is relatively small, but you are likely to run into problems as the number of indices grow, even if the total data volume is low. When having multiple users share an index they will as you point out potentially affect relevancy of each other as statistics are stored per shard.
One way to get the one index per user model to work a bit further might be to use multiple small clusters instead of one big one, but this can also be combersome.
Thanks Christian. I think routing aliases in indices by userid is my best choice , and now I have another problem that there are two aliases in an index, I want to move a document from one alias to another and not change the id. After the deleting and creating operations, I find I can search the document with index and the routing value is correct , but can't search it with the destination alias.
Why do you need to use aliases? Why not just add a user_id
as a filter client side and run it against all shards?
Does the alias cause this problem instead that the same id is created repeatly?
Using alias for that I want to make it like creating indices per user for app dev. My case is that users can basically only search their own data, so data from the same user is routed to the same sharding is make senece to me and also to performance.
But if it's alias cause the problem, I think I should use routing&filter directly.
Having a huge amount of aliases can also cause problems as they do take up space on the cluster state. If you are using routing you need to delete and reindex data if you want to move it.
Okay, I probably understand.
Thanks a lot!
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.
© 2020. All Rights Reserved - Elasticsearch
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries.