We have a multi-tenant use case where we need to index data coming from different tenants. There could be a shared common set of fields between users and each user can add their own fields. There can also be cases where the majority of the fields are different.
How to design a generic mapping/schema that can be applicable to any tenant? It would be a big operational headache to maintain one mapping file per tenant.
Our initial plan was to come up with a generic schema that looks like
string_field_1
string_field_2
.
.
string_field_25
int_field_1
int_field_2
.
.
int_field_25
We are planning to treat the nested fields in a similar way inside a different section. we would have many placeholder fields (as mentioned above) to make sure that all tenants will be supported.
We can map the fields from each tenant to a field above in the generic schema.
I am a little worried about this approach as the tf-idf scores are field based and we are mixing multiple irrelevant fields together. I am also worried whether it would make our ranking queries fragile and way more complicated.
Is that a good approach? Does someone have a better approach that has worked for you well?