I want to create many (~10k) indices each having a couple of hundred documents. I have read in other places that that's discouraged as it creates a shard per index which is a lot of overhead. However, I have the requirement that the indexing of the documents is separate. I.e. I do not want the text statistics of a document belonging to index A influences those in index B. Any ideas on how to solve this?
I was thinking to have a few indices, but with many mappings where each mapping would hold only the documents that should be indexed together. Would that work?
Is it also possible to solve it with aliases? Ie have many aliases that point to the same index? Would that also respect the "separate indexing"? That is at least my understanding what was suggested in the form here
many alias is not many index. not that many overhead compare to 10k index.
one node in system has limit of 1000 active shard limit. you can have 2000 if you want but then node will behave differently can't tell how.
one index has one mapping can't have multiple mapping.
mapping is basically telling you what kind of field type you have in that index, text,keyword,int,bool etc..
if you post some example record then it will give more idea on how to organize. Now a days there are many different way same thing can be done.
I have actually just tried creating a single index with many aliases and a filter per alias. However, that does not seem to respect analyzing the documents "per alias" separately. At least when I do a search with /_explain, it tells me some terms were found in 1000s of documents in the index (but for that alias, there are only ~100 documents).
Aliases can only offer filtering and does as you correctly pointed out not affect relevancy. Having lots of aliases could in inself be an issue anyway. I suspect grouping data as Yassine suggested might be a good compromise.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.