Need suggestion on sharding for efficiency

rahulcse · March 25, 2016, 12:07am

We are trying to design search engine for our customers.
We have many customers, some have small data, some have medium size data and some have really big data.
We are trying to go for one index per customer.
Also for small customers we want less shards and for medium one and larger ones we want an optimal number of shards.
Can anyone suggest us what is the right approach for this? Please share your views it will be helpful.

Thanks in advance for your help.

Regards,
Rahul

warkolm · March 25, 2016, 12:26am

You are better off having a small users index, that leverages routing.
Then the same for medium users, but more shards.
Then the large users get their own index.

rahulcse · March 25, 2016, 12:43am

Hi Mark, thank you for your time and answering.

How can we mix data from different small clients into one index? Will it create a security issue, as it is very sensitive data?

If we go for small number of shards per index per small customers what problems we can face?
The system is high on query volume and low on writes and updates.

Regards,
Rahul

warkolm · March 25, 2016, 12:44am

Shield can help you deal with this.

Lots of small indices/shards wastes system resources.

rahulcse · March 25, 2016, 12:47am

Hi Mark, we have many machines at our disposal, so thats not a big issue.
Also with shield usage we had to disable the java security manager in ES as we are making external calls to other services for social ranking, will that be an issue with Shield?

Regards,
Rahul

rahulcse · March 25, 2016, 1:11am

Also if we have many clients in one index, will scoring and IDF things will get corrupted and will give us some different ranking.

Please share your thoughts on this also.

Thanks for all your help.

Regards,
Rahul

Christian_Dahlqvist · March 25, 2016, 1:15am

How many users do you expect to support? Are you in control of mappings?

rahulcse · March 25, 2016, 1:20am

Each client is a unique id we have. And there are around 300 small clients (each client has around 1000's work force who will query our search clusters).

There are around 400 medium clients and around 40 big ones.

Christian_Dahlqvist · March 25, 2016, 1:33am

Having an index per user does tend to scale badly and lots of small indikes waste system resources due to the overhead associated with each shard. If you however expect less than a thousand users, going with a single, separate index per user may actually work. Small users should probably have indices with just a single shard, and this may, depending on data volumes, also apply to medium users too.

Best way to find out is to test under as realistic conditions as possible.

Make sure that you use Elasticsearch 2.x so you can benefit from delta cluster state updates.

rahulcse · March 25, 2016, 1:40am

Hi Christian, thank you for your advice. Can you please tell me how the IDF scoring is effected if we have multiple customers in one index?

Regards,
Rahul

warkolm · March 25, 2016, 1:51am

Scoring is per shard, so if you use routing then each customers scoring will be relevant to their own.

rahulcse · March 25, 2016, 4:33am

Thanks a lot for your input, they really open horizon for us.

Christian_Dahlqvist · March 25, 2016, 8:52am

When you use routing, all documents belonging to a customer will be located in the same shard, but there will be multiple users per shard, so it could impact storing.

rahulcse · March 25, 2016, 10:15pm

Routing if fine when u want to store specific data on some shards, but here data is very general apart from language there is no way to differentiate in routing. The queries to the system are general searching the document by title and body.
Do you have more inputs on routing?

Thanks a lot for all your help.

Regards,
Rahul

Topic		Replies	Views
Many small indices vs One large index Elasticsearch	6	1300	November 11, 2020
Multiple indices vs. multiple shards approach Elasticsearch	10	2265	November 4, 2022
Advice on "sharded" client setup Elasticsearch	2	401	July 6, 2017
Shards per index for smaller index Elasticsearch	3	576	January 17, 2018
When do you need more then 1 shard? Elasticsearch	12	1863	July 6, 2017

Need suggestion on sharding for efficiency

Related topics