I'd like to know the important factor on the time of opening a closed shard. I've run a few tests, but I got unreasonable results:
No of Shards, Size of Each Shard, Opening Time
50, 10 MB, 5 s
50, 150 MB, 12 s
50, 2 GB, 8 s
I expect that increasing size of shard affects the opening time, But the result of my result was different. I run the test multiple times and get a similar result.
Could anyone help me in finding the important factor on the required time for opening a closed shard?
I would expect the opening time to depend on the mappings as well as the number of segments. Certain types of mappings require data structures to be constructed in memory, which can take time. Are there any major difference in mappings and/or segment count between your indices?
Thanks Christian,
I have filled the shards with the same data.
They have the same mappings.
I think that the number of segments depends on the size of shard, i.e. larger shards have larger number of segments. Is it true?
I can not explain the differences, but the tests are not very realistic anyway. All those indices are so small they really should have only a single primary shard. What is it you are looking to achieve? What is the purpose of this test?
Let me describe our final goal.
We use elastic to store time-series data. We need to store data for a specific time (e.g., 12 months). We have enough storage to store our data, however, we have not enough RAM to follow the recommendation of this post (i.e., having 40 GB shards and 1 GB JVM heap per 20 shards). Therefore, we decide to use the frozen index for some of the old data (e.g., 5 months of data) or manually close and open them when there is a query on them.
Now, we want to know the optimal size of the shard to open our closed indices as fast as possible.
P.S. I had run similar tests with larger shard sizes (e.g., 20 GB), however, I did not find any relation between the size of a shard and required time to open it.
I am not sure I follow the reasoning. If you want to optimize heap usage you should make sure you have as large shards as possible as that generally uses less heap per document.
I would further recommend setting aside dedicated nodes for long term storage and use frozen indices rather than closing them. Mke sure you forcemerge down to a single segment before you close or freeze the indices, as that will make opening them faster. Have a look at this webinar for additional details. This will allow you to open and search large shards reasonably quickly.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.