I am using curator to manage time-based indices, and am wanting to know if it would be better to use shrink action or the reindex action.
My cluster has 2 "hot" nodes and one "cold" node. The hot nodes store incoming log data in daily indices (yyy.mm.dd) for 2 weeks before they get moved to the cold node.
the recent indices have 3 primary shards and 1 replica shard.
the log data comes in at a rate of about 15 million documents per day, taking about 10-12GB of space on the active indices.
currently, curator moves the indices older than 2 weeks to the cold node and shrinks them to 1 primary shard and 0 replicas. This reduces the index size to about 4-6GB per day.
I know that having lots of shards on a single node can be problematic, and so leaving the indices as daily indices means every day moves the node one shard closer to that maximum. I was wondering if there would be a better way to move the old indices to the cold node, but maybe group them by month rather than day. My current thought is that this could be done with the reindex action in curator. Is there anything special about the shrink action that I would be losing if I changed the method to a reindex instead of shrink?
Thanks for an answer,
I am still concerned about the growing number of shards on the cold node. Would it be beneficial to eventually reindex the shrunk indices into monthly indices at a later time to reduce shard count? Or is it reasonable to maintain hundreds of daily indices in single shards.
How many documents do you have in an index? What is the size of a shard?
I would nip this in the bud by switching to rollover indices, rather than daily, and only rolling over when a given size was reached, or targeting weekly/monthly sizes at the least.
daily indices have anywhere between 10 and 20 million documents, but that number is expected to grow.
The index data is about 12GB for 15 million documents, with 3 primaries and 1 replica, so about 2GB per shard on the active index, but since it gets shrunk to 1 primary, the cold index shards are about 5GB.
I will look into rollover indices since those seem promising as well.
I can see how the rollover is helpful for making sure shards are roughly the same size, but I still need to shrink the old index and have it allocated to the cold node. Right now with curator I'm using the max-age filter to determine which indices to shrink because the indices are created daily, how could I make sure that after it does the rollover action and the conditions were met was rolled over? I am not seeing a filter that could determine which indices were just rolled over.
That's a bit trickier. You can use the alias filter to eliminate the index currently associated with the rollover alias, and then use the pattern and count filters to work on the most recent index that matches the pattern.
thanks, I did something similar, but since I use a shrink suffix, I could also exclude by the pattern suffix rather than doing a count and sort to pick the most recent, this just grabs any logstash-<logs_index_prefix>-* indices that haven't been shrunk before and are not the one associated with the write alias.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.