Shrink vs Reindex

Micah_Hunsberger · September 5, 2018, 2:26pm

I am using curator to manage time-based indices, and am wanting to know if it would be better to use shrink action or the reindex action.

My cluster has 2 "hot" nodes and one "cold" node. The hot nodes store incoming log data in daily indices (yyy.mm.dd) for 2 weeks before they get moved to the cold node.

the recent indices have 3 primary shards and 1 replica shard.

the log data comes in at a rate of about 15 million documents per day, taking about 10-12GB of space on the active indices.

currently, curator moves the indices older than 2 weeks to the cold node and shrinks them to 1 primary shard and 0 replicas. This reduces the index size to about 4-6GB per day.

I know that having lots of shards on a single node can be problematic, and so leaving the indices as daily indices means every day moves the node one shard closer to that maximum. I was wondering if there would be a better way to move the old indices to the cold node, but maybe group them by month rather than day. My current thought is that this could be done with the reindex action in curator. Is there anything special about the shrink action that I would be losing if I changed the method to a reindex instead of shrink?

Thanks

theuntergeek · September 5, 2018, 2:47pm

Shrink will, in nearly all cases, be preferable to a reindex.

Micah_Hunsberger · September 5, 2018, 3:06pm

Thanks for an answer,
I am still concerned about the growing number of shards on the cold node. Would it be beneficial to eventually reindex the shrunk indices into monthly indices at a later time to reduce shard count? Or is it reasonable to maintain hundreds of daily indices in single shards.

theuntergeek · September 5, 2018, 3:08pm

How many documents do you have in an index? What is the size of a shard?

I would nip this in the bud by switching to rollover indices, rather than daily, and only rolling over when a given size was reached, or targeting weekly/monthly sizes at the least.

Micah_Hunsberger · September 5, 2018, 3:13pm

daily indices have anywhere between 10 and 20 million documents, but that number is expected to grow.
The index data is about 12GB for 15 million documents, with 3 primaries and 1 replica, so about 2GB per shard on the active index, but since it gets shrunk to 1 primary, the cold index shards are about 5GB.

I will look into rollover indices since those seem promising as well.

theuntergeek · September 5, 2018, 3:27pm

Indeed. I wouldn't even bother rolling over until the individual shards are at least 20GB in size. That will reduce your shard count.

Micah_Hunsberger · September 5, 2018, 6:48pm

I can see how the rollover is helpful for making sure shards are roughly the same size, but I still need to shrink the old index and have it allocated to the cold node. Right now with curator I'm using the max-age filter to determine which indices to shrink because the indices are created daily, how could I make sure that after it does the rollover action and the conditions were met was rolled over? I am not seeing a filter that could determine which indices were just rolled over.

theuntergeek · September 5, 2018, 7:08pm

That's a bit trickier. You can use the alias filter to eliminate the index currently associated with the rollover alias, and then use the pattern and count filters to work on the most recent index that matches the pattern.

Micah_Hunsberger · September 6, 2018, 3:10pm

thanks, I did something similar, but since I use a shrink suffix, I could also exclude by the pattern suffix rather than doing a count and sort to pick the most recent, this just grabs any logstash-<logs_index_prefix>-* indices that haven't been shrunk before and are not the one associated with the write alias.

    filters:
    - filtertype: alias
      aliases: logs_write_alias
      exclude: True
    - filtertype: pattern
      kind: prefix
      value: logstash-logs_index_prefix-
    - filtertype: pattern
      kind: suffix
      value: -archive
      exclude: True

system · October 4, 2018, 3:10pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Shrink, Reindex with wildcard? Replumbing the lot! Elasticsearch	3	803	May 14, 2019
Curator: delay deleting shrunk indices Elasticsearch	4	978	October 21, 2017
Re-indexing past day-wise indices to reduce no of shards Elasticsearch	5	925	January 3, 2019
How to find rolled indices Eligible to Shrink programmatically Elasticsearch	6	450	March 4, 2019
Curator - Move daily indices into weekly Elasticsearch	3	521	July 5, 2018

Shrink vs Reindex

Related topics