With removal of 'type' in mapping from 6.x release, On Dynamic Application Development Platform Project, I required many indices need to be created. All Indices have documents which don't have many similarities.
For example,
Form of Application-1
Field - A (String)
Field - B (int)
Field - C (Date)
Form of Application - 2
Field - X (int)
Field - Y (int)
Field - Z (long)
There are as many as 50 Application per Tenant. It can scale up to 500 Tenant. So, Selected design approach can have 500 X 50 = 25000 Indices. However, Each indices/application might be very small memory size (i.e. KBs to Couple of MBs at Max)
I read the forum and it has been mostly suggested to keep dense data in minimum number of indices. But in my case, There are many models without no overlapping fields. So, I do see one option which is index per model (i.e. form of application in my use case)
My question : is it good design approach considering use case ? or better alternatives ?
Different types of data can typically share an index as long as they do not have common fields with conflicting mappings. I would therefore therefore recommend reducing the index and shard count even if this means the number of fields go up. 25,000 indices with at least 50,000 shards is far beyond what is recommended for a cluster.
Hm, In that case model will not looks clean. but reduce no of index. How ever, sharing fields of 50 models in single index can have approx 500-700 unique fields. is that still ok design ?
Previously, type within index was solving such problem, but now with > 5.x, such modelling is challenging.
Elasticsearch has improved how sparse fileds are handled in recent versions, so I think having that number of fields per index is preferable to having 50,000+ shards. If tenants have the same models it may also make sense to have an index per model and have the tenants share this. You can add a field indicating tenant and filter on this in your application.
Since, Uniqueness/Scoping of field in my case is per Tenant/ per Application (i.e. Model), I don't have shared model across tenants or applications. As you pointed out, Sharing many unique fields in single index, certainly is better than having many indices considering performance/scale issue.
On another perspective, I was thinking to use "Nested Object" field type per model. Hence, I'll have 50 models fit in to 50 nested object field in single index. I'll have clear modeling within index. Do you see any disadvantage on this approach in indexing, querying or even in Scaling ? Thanks in Advance!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.