Hi,
I am working on a multi-tenant application. I have to create separate
indexes for each tenant. And I need to maintain data for 3 months in index.
I think there are two ways to implement this requirement.
Use _ttl to auto delete data from index. This would mean creating just
one index for a tenant with 4 shards and 4 replicas.
Create index for each month. This would mean that I will have to create
at least 4 indexes per tenant so that I cover 3 months index requirement.
Then I will maintain one index alias that would allow me to search across
all 4 indexes. In this strategy, I would only insert data only in present
month index. At the end of the month, I will create new index for next
month and delete oldest index. If I implement rolling indexes, I am
thinking to create indexes with just one shard and one replica.
Which strategy is better in this case, _ttl or rolling indexes?
If rolling index is preferable, it would be good to know disadvantages of
using _ttl strategy.
Drop an index is "removing" index dir which is quick and you release space.
Using TTL is like deleting docs one by one. That means, creating a new version of the document which is empty. But it takes some space.
When ES optimize the index, deleted documents will be removed. Space will be released. You can call optimize API yourself.
So, IMHO, 2/ is the more efficient way to do it.
HTH
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Hi,
I am working on a multi-tenant application. I have to create separate indexes for each tenant. And I need to maintain data for 3 months in index.
I think there are two ways to implement this requirement.
Use _ttl to auto delete data from index. This would mean creating just one index for a tenant with 4 shards and 4 replicas.
Create index for each month. This would mean that I will have to create at least 4 indexes per tenant so that I cover 3 months index requirement. Then I will maintain one index alias that would allow me to search across all 4 indexes. In this strategy, I would only insert data only in present month index. At the end of the month, I will create new index for next month and delete oldest index. If I implement rolling indexes, I am thinking to create indexes with just one shard and one replica.
Which strategy is better in this case, _ttl or rolling indexes?
If rolling index is preferable, it would be good to know disadvantages of using _ttl strategy.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.