I have created an index "index-000001" with primary shards = 5 and replica = 1. And I have created two aliases
alias-read -> index-000001
alias-write -> index-000001
for indexing and searching purposes. When I do a rollover on alias-write when it reaches its maximum capacity, it creates a new "index-000002" and updates aliases as
alias-read -> index-000001 and index-000002
alias-write -> index-000002
How do I update/delete a document existing in index-000001(what if in case all I know is the document id but not in which index the document resides) ? I cannot use alias-write as it is pointing to the new index and I cannot use alias-read as it is pointing to multiple indices.
The rollover functionality is great when you have large amounts of immutable data, but if you need to update documents it will complicate things as you need to know the index name. Using rollover may therefore not be the ideal option for your use case, but if would be easier to give advice if you described your use case in more detail.
I have a multi tenant application, instead of one index for all tenants and one index per tenant, we are going with the hybrid approach where one index is shared among few tenants.
Lets say I create an index "Index-000001" for tenants T1, T2, T3, T4, T5.
In order to make indexing and searching easier, we were planning on creating
T1_write, T1_read
T2_write, T2_read
T3_write, T3_read
T4_write, T4_read
T5_write, T5_read
aliases, all pointing to the same index. So while indexing a document belonging to T3, we use T3_write alias.
Once the index reaches the max limit (lets say we dont want our indices to hold more than 1 Million documents), we want to create new index "Index-000002". And we thought of using index alias to rollover the index. And we update all our "_write" indices to point to new index and update our "_read" indices to point to both new and old index.
But if a document, existing in "Index-000001", gets modified, we cannot use Tx_write alias to index the document as it points to the new index. If there were multiple old indices, how do I find out in which index my document resides ?
If the above aliasing structure isn't the right way, what would be a better way to architect our system?
That is correct. You would first have to search for the document to identify in which index itv resides before updating directly against that index.
It is difficult to recommend a specific approach without knowing more about the use case. What type(s) of data are you indexing? How common are updates and/or deletes of this data compared to inserts of new data? How long is data kept in the cluster?
In this scenario a time-based or rollover index is not a natural fit in my opinion. Indices shared by a group of tenants might be a good approach as it means each tenant always have all documents in a single index, which makes it easy to update. The number of such shared indices, shards per index and tenants per index depends on the number of tenants as well as the expected data volumes.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.