Effective separation of tenant data in latest release of ElasticSearch


(Rajesh Kishore) #1

Hi All,

We want to use ElasticSearch as a multi-tenant store , each tenant would have different requirement for document type/schema.

What is the best way to store data wrt cost, manageability in this regard ?

1> Each tenant having separate index with varying document types may not be efficient?

2> A set of tenants may fall into one index with varying document types
but With ElasticSearch's removal of mapping types mentioned in link
It seems to be possible only through have custom type as mentioned in the link.

Please advise what is the best possible way to seperate tenant's data with each tenant having separate schema/document type requirement?

Thanks,
Rajesh


(Mark Walkom) #2

That custom type is literally just a field and value, there's nothing special about it.


(Rajesh Kishore) #3

so could you pls advise what is the best strategy?


(Mark Walkom) #4

If you want to separate by customer then you will probably need to separate out documents that are not similar, perhaps you will need multiple indices per customer.
If you want to group by document similarity then that would be ok, you just need to manage multi-tenancy with something like Security.

The best solution is one that works for you out of those, they both have pros and cons.


(Rajesh Kishore) #5

But multiple indices per customer , wont affect performance ? and we wont have similar document type per tenant / or across tenant


(Christian Dahlqvist) #6

How many tenants are you expecting? How much control do you have over the data?


(Rajesh Kishore) #7

There can be many because initially we will have lot of free customers. Its not possible as of now to quantify how much as this is the cloud service we are building


(Christian Dahlqvist) #8

As mappings have to be consistent per index, you will need to impose some control on the content and mappings if you want tenants to shard indices. This is usually necessary as having an index per tenant scales badly. Having lots of small indices will result in performance problems.

There are no easy solutions, but I have seen users place controls on the data and have small users share indices and let a smaller number of larger users have their own.

I have seen users try going with one index per tenant and then deploy this across a lot of small clusters. This reduces the size of the cluster state per cluster but also does not necessarily scale well.


(Rajesh Kishore) #9

Got the idea to some extent, let me put more research on this , I will come back to this. In the meantime, more suggestions are highly appreciated.


(Christian Dahlqvist) #10

This has been asked before, so you may find additional points if you search the forum.


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.