Hi,
we are planning to use Elasticsearch in a multi tenant environment. Based on the article here -
https://www.elastic.co/blog/found-multi-tenancy
We want to use a shared index model for all the tenants but are stumbled across an issue with field mapping collision i.e. TenantA can have fields 1,2,3 where as TenantB can have fields 2,4,5 if we keep adding these fields to the index i think we might hit the 1000 field size limit. Any adivce on how to over come this issue? thanks in advance
You could do something like;
{
{ "custom_field0": "field_1", "value": "whatever"},
{ "custom_field1": "field_2", "value": "something else"}
}
And then repeat that.
How are you exposing the data? Do you have an application layer between the user and the data? Do you allow users to use Kibana or other types of more direct access?
@Christian_Dahlqvist thanks for the reply. Yes, we have an application layer between User and Data. No, we are not going to expose Kibana for users but we will for administrators.
@warkolm thanks for the reply. I believe your answer would address the field mapping collision. Question is on number of fields, if there are 10 tenants that are going to use the shared index and each tenant has 10 distinct fields then all 10 tenants will exhaust the limit of 1000 fields limit for an index making it not possible to add new tenants. Is there a way i can change the fields per tenants that will not hit the limit?
One way I have seen this addressed in the past is through a translation layer at the application layer where the customer field names are mapped to standard names that are shared across tenants. This assumes each tenant only ever has access to its own data. You would e.g. have a number of defined fields per type, e.g. string_field_1
, string_field_2
, long_field_1
, long_field_2
, date_field_1
, date_field_2
etc. Tenant 1 may have a name
field that gets mapped to string_field_1
while tenant 2 could have a category
field mapped to string_field_1
. This naturally has an impact on relevancy and scoring, but as each user only has access to their own documents this may be acceptable.
Another benefit of this approach is that you avoid mapping conflicts, which would happen if 2 tenants wanted to store different data in a field with the same name.
@Christian_Dahlqvist
If tenant1 has 2 string fields Name and Description then those will be mapped under Tenant_1_string_field
something like
Tenant_1_string_field:
{ "custom_field0": "Name", "value": "whatever"},
{ "custom_field1": "Description", "value": "something else"}
Is this correct? if so, are there any limitations in number of subfields? also, what about search are all queries supported?
No. I. am not sure what Mark is suggesting, but suspect it relates to using nested documents, which can have a significant impact on how you query the data as well as performance.
Assume tenant1 has a field named name
(string
, translated to string_field_1
) and another field named age
(long
, translated to long_field_2
) and submit the following document:
{
"name": "Bob",
"age": 32
}
This would be stored in Elasticsearch as follows:
{
"tenant": "tenant1",
"string_field_1": "Bob",
"long_field_2": 32
}
Naturally this keeps the number of fields in the index down and avoids mapping conflicts as all tenants use the same fields. It however means that all documents submitted and requested by the client need to be translated and queries will need to be rewritten, which requires a dictionary in your application.
This complicates the application tier quite a bit, but should scale and perform well.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.