We have typical deployment of 5000 users per server. We are planning to use Elasticsearch for indexing new data onwards. Our earlier indexing engine was having provision to map one index with one user. It was easier to restore single user index in case of any individual failures.
With Elasticsearch I see there is limitation of using 1000 shards per server. One Elasticsearch index could contain one more shards. Segment level allocation control is not available.
In either of the case, I need to map multiple users to single Elasticsearch Index. In case of any single user item failures, I may need to restore/repair entire Elasticsearch Index.
I wanted avoid unnessaray overriding of data for non impacted users In case of restore and repair.
Can anyone tell me best way to tackle this problem?
How much data does each user have? What is the total expected data volume?
Typically, on an average 60k items per user.
As I mentioned above, there could be average 5k users per server.
How are you exposing the data to the users? Custom UI?
We have end user interface using Soap API.
Then I would propose putting all users in a single index with a suitable number of primary shards. You can use routing to minimise the number of shards queried and add a user filter at the application layer to ensure each user sees the correct data. This will scale much better and be more efficient than an index per user. This does assume there are no mapping conflicts between users though.
We are planning have a similar approach. Around 1000 users will map to a single Elasticsearch index contain 2 shards of 50 GB each.
To minimise the impact if any Index goes down, we are limiting the mapping to 1000 users. Let me know if you see any issues or any better approach here?
Secondly, the restore using snapshot works at index level. If there will be any issues with single user indexed items, then restoring an index would override other users item unnessasarily. Any idea how to overcome this problem?
I do not see a need to limit it to 1000 users per index. It just adds a step of identifying the correct index for the user without much benefit. If you use routing you can also speed up searching by having a reasonably large number, e.g. 100, primary shards if you use routing.
Restoring an index will indeed affect all users. You can however restore an index under a different name and delete and reindex data for a specific user as they have relatively little data.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.