Elastic Index Allocation

Abhishek_Shinde · May 5, 2021, 8:25pm

We have typical deployment of 5000 users per server. We are planning to use Elasticsearch for indexing new data onwards. Our earlier indexing engine was having provision to map one index with one user. It was easier to restore single user index in case of any individual failures.

With Elasticsearch I see there is limitation of using 1000 shards per server. One Elasticsearch index could contain one more shards. Segment level allocation control is not available.

In either of the case, I need to map multiple users to single Elasticsearch Index. In case of any single user item failures, I may need to restore/repair entire Elasticsearch Index.

I wanted avoid unnessaray overriding of data for non impacted users In case of restore and repair.

Can anyone tell me best way to tackle this problem?

Christian_Dahlqvist · May 5, 2021, 8:58pm

How much data does each user have? What is the total expected data volume?

Abhishek_Shinde · May 6, 2021, 8:40am

Typically, on an average 60k items per user.

As I mentioned above, there could be average 5k users per server.

Christian_Dahlqvist · May 6, 2021, 8:41am

How are you exposing the data to the users? Custom UI?

Abhishek_Shinde · May 6, 2021, 8:45am

We have end user interface using Soap API.

Christian_Dahlqvist · May 6, 2021, 8:52am

Then I would propose putting all users in a single index with a suitable number of primary shards. You can use routing to minimise the number of shards queried and add a user filter at the application layer to ensure each user sees the correct data. This will scale much better and be more efficient than an index per user. This does assume there are no mapping conflicts between users though.

Abhishek_Shinde · May 7, 2021, 4:47am

We are planning have a similar approach. Around 1000 users will map to a single Elasticsearch index contain 2 shards of 50 GB each.

To minimise the impact if any Index goes down, we are limiting the mapping to 1000 users. Let me know if you see any issues or any better approach here?

Secondly, the restore using snapshot works at index level. If there will be any issues with single user indexed items, then restoring an index would override other users item unnessasarily. Any idea how to overcome this problem?

Christian_Dahlqvist · May 7, 2021, 5:41am

I do not see a need to limit it to 1000 users per index. It just adds a step of identifying the correct index for the user without much benefit. If you use routing you can also speed up searching by having a reasonably large number, e.g. 100, primary shards if you use routing.

Restoring an index will indeed affect all users. You can however restore an index under a different name and delete and reindex data for a specific user as they have relatively little data.

system · June 6, 2021, 2:44am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Can I have one Index per User? Elasticsearch ilm-index-lifecycle-management	2	1553	December 23, 2019
How many indices can be created Elasticsearch	8	13966	August 14, 2018
Scalability and creating 1 index per user Elasticsearch	4	912	July 6, 2017
Scaling ElasticSearch for many indexes Elasticsearch	2	25	October 22, 2024
Over-allocation of shards Elasticsearch	9	1565	July 6, 2017

Elastic Index Allocation

Related topics