This is my use case. We are from analytics domain. We track the traffic coming to website & generate some reports with the collected data.
We are planning to assign one index per user. Currently we have 33000 active users & this users count might increase day by day. Each user will have different amount of data. Some users have large amount of data if their website has huge traffic & some users have less data if their website have less data. Website data is not judgeable. We have data retention for 1 year.
Our Current ES implementation :-
- 40 indexes that are shared across 33000 users i.e Each user is randomly assigned an index out of the 40 indexes.
- shards we have 20 primary shards per index & replica factor as 1.
- We use 128GB ram & 1.6TB storage & 20 CPU cores per data nodes.
- We user 16GB ram & 256GB storage & 40 CPU cores per Master nodes.
- Currently Each index is shared by 900 users approximately.
Current problems we are facing:-
- Report is slower for the indexes that has large amount of data. In current Approach if one user is having huge data other users are facing report slowness as they are sharing the common index.
- GC is taking longer time if some one is viewing the report from larger index.
Doubts I have :-
- Can I have one Index per user?
- What is the Shards Limit per node?
- Will Switching from CMS to G1GC solve the GC issue?