I have read a lot on the forums for pros and cons of multiple indexs vs single index and I want to get some advice for my specific use case. I have a pipeline that ingests word documents from clients, and afterwards runs a query against it. The query and the documents do not change so in normal circumstances only one query is executed and the results are expected to be the same. Is it better for me to have
single index for multiple customers and all of their documents
single index for each customer and all of their documents
single index for each customer and single index for a set of documents (for one query)
single index for every set of documents but dropped after query is completed
The answer is It Depends. Do you do specific retention periods for different customers? Or do you want to be able to do billing based on resource use (eg disk)? Do you have security and privacy requirements for some customers?
If so, then splitting by customer might make a lot of sense.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.