We have Elasticsearch cluster with 30 data nodes (i3.2xlarge AWS instances).
They are storing approx 37TB of data spread between 335 indicies (1 index per day per app service).
The largest indecies reach almost 800-900GB in size.
All indices have 30(primary)+30(replica)=60 shards.
Since older indices are not being written to, we would like to take advantage of the Shrink API and Hot-Warm architecture.
However, while testing the shrink performance of apporx 400GB (500M documents) index I noticed it would take more than 1 hour on completely idle i3.4xlarge node.
We've noticed that CPU utilisation was low during the shrink operation and the node was probably busy doing segment merges.
I am wondering how we could speed up the shrink operation and/or the merge operation.
37TB of data across 335 indices with 20,000 shards gives an average shard size of just below 2GB, which is quite small. We generally recommend aiming for an average shard size in the tens of GB for use cases with time based indices. Having 60 shards for all indices, irrespective of expected daily volume, seems wasteful and inefficient. I assume that this is done to make sure primary and replica shards can be spread evenly across the nodes. As primary and replica shards mostly do the same amount of work, having 15 primary and 15 replica shards per index would probably give an equally good spread. Small indices may not need to be spread evenly across all nodes. One shard for a small index on one node may be roughly equivalent to a small shard for a different index on another node.
It might be better trying to right-size the number of shards at the outset and thereby avoid having to use the shrink index API. If daily volumes are difficult to predict or varies a lot, you may want to look into using the rollover index API, which allows you to target a certain size for an index rather a specific time period.
Forcemerge is an expensive operation, and I am not aware of any ways to speed it up. You may want to make sure you run force merge during off peak hours, at least for larger indices.
Let me clarify our case further. We have, say, "main" or "important" indices. These are varied somewhere between 300GB-900GB in size and they all have 30*2 shards. We also have some very small (and not so important) indices with just a few shards and (some of them) with no replica. They are small in size, in the number of documents and we normally do not search them at all. I just did not realize how many of these indices we have and how they could skew the real picture. I took the total number of indices (335) from cerebro without giving it much thought. My apologies for that. From now on, we will talk about important indices only.
You are absolutely correct that we try to maintain even distribution of data across all nodes. We also want to make sure we can accommodate a huge spike in the number of generated logs and increase the indexing rate as necessary without hitting the back pressure. We deliver logs to Elasticsearch through kafka and if there is a kafka lag for whatever reason, we want to close the lag as fast as possible. Just few days ago we reached an indexing rate of 40K+/sec in a single index when it happened. That's why we thought it would be best to have as many primary shards as there are data nodes in the cluster.
When the indices get old and are not being written to anymore , we want to optimize them for search.
The idea is to move them to an idle node and shrink them there one by one. Naturally, we want to keep that idle node as busy as possible during the shrink, so the question is what settings we could change to achieve that. Right now the shrink takes to much time and the node seems to be under utilised during this process.
I found some tips about changing "indices.store.throttle.max_bytes_per_sec", "indices.store.throttle.type" , index.merge.scheduler.max_thread_count settings but they seem to apply to 1.x / 2.x versions only.
I am wondering if there are similar settings in 5.x.
Thank you very much for all of your help!
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.