We do have a use case of full text search of application data. There are many clients for which we are going to manage data. Total clients will be around 100 and client data size will be from 1 GB to 50 GB. Data will have frequent updates same updates we will be synching in Elasticsearch.
What should be the best approach
Multiple small/mid/big Indexes - which will have very small multiple shards
Single index containing all data - here we can have few shards of suggested 30GB sizePreformatted text
Note : Schema/Fields are fixed for the whole data around 10 fields will be there, 4-5 fields will of text type others will be long/timestamp time.
If you are going to have differente clients, then you should try to keep the data of the clients separated from each other, in this case you should try to use a per client index strategy.
Using a single index with all the data makes more complicated to manage the permissions of who can read or not the data, you would need to use document level security, which is a paid feature.
Currently we have single index containing all the data and also
we are using 'alias with condition' for each client to make sure to fetch correct data only. And it is working completely fine.
So we do need to look on other aspects as I asked.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.