Multiple small indexes or one index with potential mapping explosion

Background
Having read the various posts on multiple indexes with small number of document vs single index with all the documents, I still am not sure what to do in my situation...

We have a approximately 2-3 million documents spread across what would be 300-400 'types'. When I say type, each type is the same fundamental document but each one can have an unlimited number of associated data attributes. To index them my preference was to put them 1 index per type (300-400 index), each with a common alias. This would ensure that a single index would not risk type clashes on those data attributes, or potential mapping explosion.

But the overhead of each shard has made me think that this is not advisable. If I am wrong about this and I can have several hundred indexes, that would be great as all the other problems go away.

Assuming I cannot have this many indexes on a 3 node cluster, the alternative is to put them in the same index. We are happy not to index (index: false) the associated data attributes, which removes the risk of a type clashes but still persists them into _source. This is fine, however I still want those attributes added in the mappings, just with indexing disabled. However, we are hitting a configurable limit of 1000 fields per mapping.

Questions

  1. Can I have several hundred indexes running on a 3 node cluster? Each index would have between 100-100,000 documents in it.

  2. If 99% of my mappings have indexing disabled, is there any overhead in leaving dynamic turned on so the mappings are updated? I would need to increase the max fields per index to the region of 100,000. I know this is way above the recommended number of fields, but if indexing is disabled on the majority of those fields, does this recommended limit still stand?

There is a lot of it depends in here as an answer, let me try to unfold this a little bit.

First, the main problem with searching so many indices is, that you will always hit every single shard if you are searching across your whole dataset, and that will slow things down. Maybe you have an option to only hit relevant indices using the constant_keyword datatype. Second, even though not recommended, running this might be still be worth a try, though I expect at a certain level of parallel queries rejections because of the sheer number of shards to query. That said, it might still be worth a test if you are aware of the issues and have a plan to fix them once they become even more prominent.

Your second question is hard to answer, as this depends on the number of mappings added over time.

Maybe the flattened datatype can help you as well in your use case to prevent a mapping explosion.

Sorry for not replying. I guess this forum doesn't automatically email you :frowning: But thanks for the help.

So with regards to my second question, there would be thousands of mappings, potentially 10k. But I think my question was 'is the number of mappings actually relevant if all of them have indexing disabled?'.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.